Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
AstridM
Helper I
Helper I

Loading a database on fabric is consuming a lot of capacity

Good morning,

I have F64, and we are trying to migrate from the serverless to fabric.

We have a pipeline that loads data from a lakehouse to a database in fabric, around 300 GB, and every time it rans, it consumes most of my capacity, it doesnt make sence that loading a db takes so much capicity, and the capacity is consumed by the db, not the pipeline, what can I be doing wrong?

All we do is truncate and insert from a lakehouse.

Thanks

1 ACCEPTED SOLUTION
Aala_Ali
Advocate IV
Advocate IV

Hi @AstridM ,

You're absolutely right to question the high capacity usage during your Fabric DB load. Here's a breakdown of what could be happening and how to optimize it:

 What Might Be Going Wrong

  • Truncate + Insert Pattern: Every full reload can trigger costly internal operations like table optimization (V-Order), statistics recalculation, and index maintenance.

  • Warehouse Handles the Load: Even though you're using a pipeline, the heavy capacity hit comes from the Warehouse, not the pipeline activity.

  • No Batching: Loading 300 GB in one operation causes spikes in memory/compute usage and longer lock times.

  • Auto V-Order Optimization: Fabric automatically re-optimizes the table layout unless you control it manually.


 Optimization Suggestions

  1. Avoid Truncating Large Tables

    • Use MERGE, UPSERT, or a staging table with a swap pattern to reduce overhead.

  2. Batch Your Loads

    • Break the data into smaller chunks (e.g., by date) and load in parallel or sequential batches.

  3. Disable and Manually Trigger V-Order

    sql
     
    ALTER TABLE your_table SET (enable_vorder = false); -- Load data ALTER TABLE your_table SET (enable_vorder = true); OPTIMIZE your_table;

    Note: Only disable V-Order if you plan to optimize manually after the load.

  4. Use Staging Tables

    • Load into a temporary/staging table first, then switch or insert into the main table for better performance control.

  5. Monitor with Query Insights

    • Use the queryinsights schema or Fabric Capacity Metrics App to pinpoint which operations are using the most capacity.

Final Tips

  • Try loading a smaller sample (e.g., 10 GB) to see where the spike begins.

  • Run your heavy loads during off-peak hours if using shared capacity.

  • Consider temporarily upgrading capacity (e.g., to F128) if loading large datasets regularly.


If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful 🙂.

View solution in original post

5 REPLIES 5
JasonFife
New Member

Loading 300GB to an Azure SQL Database isn't a problem, at a cost that's reasonable.  Trying to do the same thing loading a Fabric Database shouldn't cost exponentially more, should it?

Aala_Ali
Advocate IV
Advocate IV

Hi @AstridM ,

You're absolutely right to question the high capacity usage during your Fabric DB load. Here's a breakdown of what could be happening and how to optimize it:

 What Might Be Going Wrong

  • Truncate + Insert Pattern: Every full reload can trigger costly internal operations like table optimization (V-Order), statistics recalculation, and index maintenance.

  • Warehouse Handles the Load: Even though you're using a pipeline, the heavy capacity hit comes from the Warehouse, not the pipeline activity.

  • No Batching: Loading 300 GB in one operation causes spikes in memory/compute usage and longer lock times.

  • Auto V-Order Optimization: Fabric automatically re-optimizes the table layout unless you control it manually.


 Optimization Suggestions

  1. Avoid Truncating Large Tables

    • Use MERGE, UPSERT, or a staging table with a swap pattern to reduce overhead.

  2. Batch Your Loads

    • Break the data into smaller chunks (e.g., by date) and load in parallel or sequential batches.

  3. Disable and Manually Trigger V-Order

    sql
     
    ALTER TABLE your_table SET (enable_vorder = false); -- Load data ALTER TABLE your_table SET (enable_vorder = true); OPTIMIZE your_table;

    Note: Only disable V-Order if you plan to optimize manually after the load.

  4. Use Staging Tables

    • Load into a temporary/staging table first, then switch or insert into the main table for better performance control.

  5. Monitor with Query Insights

    • Use the queryinsights schema or Fabric Capacity Metrics App to pinpoint which operations are using the most capacity.

Final Tips

  • Try loading a smaller sample (e.g., 10 GB) to see where the spike begins.

  • Run your heavy loads during off-peak hours if using shared capacity.

  • Consider temporarily upgrading capacity (e.g., to F128) if loading large datasets regularly.


If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful 🙂.

This all nice and dandy but I have a huge complaint here.  Upgrading to F128, or for that matter even being able to afford F64 is literally not possible for a lot of SMEs.  The cashflow of a SME is not necessarily correlated to their data and compute needs. So you might have users with poor cashflow that need F64 or higher SKUs but can't afford the insane pricing that only a JP Morgan or Stellantis can.   At that point, what choice is left for these SMEs?

Completely agree with you, as we are on a deadline I will push everything to synapse and rething the strategy on how this is going to work. I would like if we have a better unfestanding of how capacity is consumed and the reporting was a bit better. But what I can say 300 GB is not that much, this is supposed to be a platform to run business analysis and a bunch of other things.

You are right. 300 GB doesn't even qualifies as big data. What irks me, as I am looking at my Fabric Capacity Metrics app, there is a lot of CUs consumed by some data warehouse even though I did not spin up any such warehouse!  A lot of hidden costs lurk in the shadows when only using a lakehouse, a few notebooks and pipelines, and one semantic model and report.  

 

And it is far from clear whether all this CU reporting is actually correct.  Let's just say end users are justified to have doubts given the long list of issues that plagues the whole platform.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.