Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hello everyone,
I recently transitioned to using Microsoft Fabric products, specifically Notebooks (PySpark) and Lakehouse, running on F2 capacity. My current setup involves retrieving data from an API using Notebooks and storing it in a Lakehouse. While this workflow functions well, I’ve noticed significant costs due to high CU usage during Notebook execution. Here's a summary of my use case and the challenges I’m facing:
I have two Notebooks that process data for two customers:
Notebook for Customer 2:
Notebook for Customer 1:
Additionally, storage costs are relatively low, but I’m wondering if using a Warehouse instead of a Lakehouse for storage would be more cost-efficient in the long term.
Main points:
High CU Usage in Notebooks:
Storage Costs: Lakehouse vs. Warehouse:
Capacity Management Issues:
Here’s a summary of CU and runtime statistics across my Fabric workspace for reference:
WorkspaceItem KindItem NameCU (s)Duration (s)UsersBilling Type
Fabric workspace | SynapseNotebook | Notebook Customer 2 | 2,117.9995 | 529.4930 | 1 | Billable |
Fabric workspace | SynapseNotebook | Notebook Customer 1 | 999.9555 | 249.9850 | 1 | Billable |
Fabric workspace | Lakehouse | Lakehouse Customer 2 | 872.7968 | 7.2210 | 1 | Billable |
Fabric workspace | Warehouse | Warehouse Customer 2 | 742.1780 | 1,660.0960 | 2 | Billable |
Fabric workspace | Lakehouse | Lakehouse Customer 1 | 405.6838 | 3.1230 | 1 | Billable |
Fabric workspace | Warehouse | Warehouse Customer 1 | 214.3460 | 340.0870 | 2 | Billable |
Fabric workspace | Pipeline | Pipeline Customer 2 | 100.8000 | 587.7270 | 1 | Billable |
Fabric workspace | Pipeline | Pipeline Customer 1 | 40.3200 | 573.1680 | 1 | Billable |
Total | 5,494.0796 | 3,950.9000 | Billable |
Thanks guys in advance
Solved! Go to Solution.
Hi @CReportify
Yes, it is possible to reduce the driver cores when executing a notebook in Microsoft Fabric. You can configure the driver cores using the `%%configure` magic command at the beginning of your notebook. Here’s an example:
%%configure -f
{
"driverMemory": "56g",
"driverCores": 32,
"executorMemory": "28g",
"executorCores": 4
}
Microsoft Fabric does not currently offer a built-in way to manually trigger or schedule specific background tasks like metadata processing during off-peak hours. The platform automatically manages certain background processes, and there’s no direct control over when these tasks execute after capacity is resumed.
Microsoft Fabric supports autoscaling for Spark pools. When you set up a Spark pool, you can enable autoscaling by configuring a range for the number of nodes. The system will automatically add or remove nodes based on workload demand.
I have tried to answer all your queries here.
please accept this as solution if this resolved your query.
thanks
Hi @CReportify
Please try these options:
Reduce Driver Cores
Your notebooks are currently using 8 driver cores, which may be excessive for the amount of data processed. Try reducing the number of cores to 4 or even 2, as this can significantly lower CU consumption without necessarily impacting performance for your data volumes.
Consider Pipelines
For repetitive API data fetching, pipelines might be more efficient than notebooks. Pipelines can:
• Better handle scheduling
• Provide built-in error handling and logging
• Potentially reduce overall CU usage
Storage: Lakehouse vs. Warehouse
For your current data volumes, a Lakehouse is likely more cost-effective than a Warehouse. Lakehouses are optimized for:
• Handling diverse data types
• Frequent small-scale reads/writes
• Machine learning workloads
However, if you anticipate significant growth in data volume or require complex SQL analytics, a Warehouse might become more suitable in the future
Capacity Management
The massive CU usage spikes after resuming capacity are likely due to the “smoothing” mechanism in Fabric. When you pause capacity, Fabric immediately charges for all scheduled background operations instead of spreading the cost over 24 hours. To mitigate this:
• Schedule pauses during naturally low-usage periods
• Implement autoscaling for more dynamic workloads
Additional Optimization Strategies
1. Batch Processing: Implement batching in your API calls to reduce the number of individual requests, potentially lowering CU usage.
2. Data Partitioning: Properly partition your Lakehouse data based on common query patterns to improve read performance.
3. Caching: Utilize Spark caching for frequently accessed datasets to reduce computation overhead.
4. Monitor and Analyze: Regularly use the Fabric Capacity Metrics app to identify high-consumption operations and optimize accordingly
Hope this helps.
please give kudos and mark this as solution if it resolves your query.
thanks
Hi nilendraFabric,
Thanks for your response, its great!
Is there an option to reduce the driver cores when executing a notebook? Also, how to implement autoscalling for more dynamic load, do you have an example? Lastly, Is there a way to manually trigger or schedule specific background tasks (like metadata processing) to run during off-peak hours, so they don’t immediately execute after capacity is resumed?
Thanks in advance!
Hi @CReportify
Yes, it is possible to reduce the driver cores when executing a notebook in Microsoft Fabric. You can configure the driver cores using the `%%configure` magic command at the beginning of your notebook. Here’s an example:
%%configure -f
{
"driverMemory": "56g",
"driverCores": 32,
"executorMemory": "28g",
"executorCores": 4
}
Microsoft Fabric does not currently offer a built-in way to manually trigger or schedule specific background tasks like metadata processing during off-peak hours. The platform automatically manages certain background processes, and there’s no direct control over when these tasks execute after capacity is resumed.
Microsoft Fabric supports autoscaling for Spark pools. When you set up a Spark pool, you can enable autoscaling by configuring a range for the number of nodes. The system will automatically add or remove nodes based on workload demand.
I have tried to answer all your queries here.
please accept this as solution if this resolved your query.
thanks
Hi @CReportify,
Thankyou for connecting with Microsoft Community Forum.
Implement batch API calls, reduce driver cores, and optimize Spark processing to lower CU consumption. Utilize Fabric Pipelines for API-based ingestion to enhance efficiency, provide retry mechanisms, and improve concurrency control.
For frequent queries (e.g., Customer 2), consider transitioning to a Warehouse to boost performance. Continue using the Lakehouse for cost-effective bulk storage and processing.
To avoid CU spikes after capacity resumes, stagger workloads and monitor usage patterns for further adjustments.
These steps are expected to reduce costs while maintaining performance.
Please let us know if further assistance is required. Should this information prove useful, kindly mark it as a solution and consider giving us a Kudos.
Regards,
Sahasra.
Hi @CReportify,
I wanted to check in on your situation regarding the issue. Have you resolved it? If you have, please consider marking the reply that helped you or sharing your solution. It would be greatly appreciated by others in the community who may have the same question.
Thanks for using Microsoft Fabric Community Forum.
User | Count |
---|---|
33 | |
14 | |
6 | |
3 | |
2 |
User | Count |
---|---|
39 | |
22 | |
11 | |
7 | |
6 |