Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
mrbartuss
Helper II
Helper II

What actually influences CU consumption in Dataflows Gen2?

I’m trying to better understand what really drives CU consumption in Dataflows Gen2, because my observations don’t fully match my expectations.

I have two Dataflows Gen2, both orchestrated via a pipeline:

  1. Dimensions dataflow

  • Loads 19 tables into a Lakehouse

  • All tables come from one Excel file on SharePoint, each from a different sheet

  • Transformations: only Select Columns

  • Runtime: ~4 minutes

  1. Facts dataflow

  • Loads 8 tables

  • All tables come from different Excel files on SharePoint
  • Heavy transformations:

    • Multiple pivot / unpivot

    • Merges

  • Two tables have >15M rows

  • Runtime: ~10 minutes

Intuitively, I expected the facts dataflow to consume significantly more CUs due to:

  • Much larger row counts

  • More complex transformations

However, In practice, the dimensions dataflow actually consumes more CUs than the facts dataflow.

What are the main factors that actually influence CU consumption in Dataflows Gen2?

1 ACCEPTED SOLUTION
Tamanchu
Continued Contributor
Continued Contributor

Hi @mrbartuss,

 

Yes, exactly and the saving is actually proportional to your table count.

Without query referencing (no "Reference" step):
Each of your 19 output tables would independently evaluate its own query all the way to the source. That means 19 separate HTTP requests to SharePoint + 19 full Excel file parses through the Mashup engine. Each one is billed at 12 CU/s for up to 10 minutes.

With query referencing (your current setup) :
The SharePoint read happens once (the base "Source" query), and all 19 output tables branch off that single evaluation. Only the filtering/column selection steps run 19 times which are near-instant in-memory operations.

So in your Dimensions dataflow specifically, query referencing could be saving you up to ~18× the cost of a full source re-read. You're already doing the right thing there.

The remaining optimization lever (point 1) would remove the SharePoint dependency entirely once the file is in OneLake as a Delta table, even that single source read becomes a local Delta scan instead of an HTTP call to SharePoint.

View solution in original post

7 REPLIES 7
v-karpurapud
Community Support
Community Support

Hi @mrbartuss 

We have not received a response from you regarding the query and were following up to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.

 

Thank You.

v-karpurapud
Community Support
Community Support

Hi @mrbartuss 

Thank you for posting your query in the Microsoft Fabric Community Forum, and thanks to @Tamanchu  for sharing valuable insights.

 

Could you please confirm if your query has been resolved by the provided solution?If you have any more questions, please let us know and we’ll be happy to help.

Regards,

Microsoft Fabric Community Support Team

 

Tamanchu
Continued Contributor
Continued Contributor

Hi @mrbartuss,

Great question and your observation is actually expected behavior once you understand how Dataflow Gen2 billing works under the hood.

CU is charged per query, not per dataflow
According to the official Dataflow Gen2 pricing docs, there are three separate billing meters:

  1. Standard Compute (Mashup Engine) charged per query, per second
  2. High Scale Compute (Staging) 6 CU/s when staging is enabled
  3. Fast Copy 1.5 CU/s when applicable

The key insight: Standard Compute is billed per output query every output table runs as a separate Mashup engine evaluation. For CI/CD dataflows, the rate is :

  • 12 CU/second for the first 10 minutes of each query
  • 1.5 CU/second for every second beyond 10 minutes (an 8× discount)

Why your Dimensions dataflow costs more

  • Dimensions (19 tables, ~4 min each) = 19 separate billing events, all staying entirely in the 12 CU/s tier
  • Facts (8 tables, ~10 min each) = 8 billing events, and any query time beyond 10 minutes drops to 1.5 CU/s

More tables + shorter runtime = more CU than fewer tables + longer runtime. Counterintuitive, but it follows directly from the per-query billing model.

 

Optimization ideas

  • Consolidate output queries where possible fewer tables = fewer billing events
  • Use Fast Copy connectors where eligible (1.5 CU/s vs 12 CU/s for the Mashup engine)
  • Let complex queries run long the two-tier rate rewards duration past 10 min with an 8× cost reduction
  • Monitor per operation in the Fabric Capacity Metrics app using "Dataflow Gen2 Run Queries" and "High Scale Dataflow Compute" separately

 

Hope this clarifies the billing model!

Thanks for the super‑clear explanation.

Given that, what would you say is the recommended alternative to reading Excel files directly from SharePoint in this setup?

Tamanchu
Continued Contributor
Continued Contributor

Hi @mrbartuss,

Given your setup, the two most impactful alternatives:

1. Pre-land Excel files into OneLake before the Dataflow runs

Add a Copy activity in your pipeline (before the Dataflow step) to copy the Excel files from SharePoint into a Lakehouse. Then point your Dataflow to the Lakehouse tables instead of SharePoint directly.

Benefits:

  • SharePoint is only hit once per pipeline run, not once per output table
  • Reading Delta Tables from OneLake is significantly faster than parsing Excel over HTTP
  • You can leverage Fast Copy (1.5 CU/s) instead of Standard Compute

→ Docs: Copy activity in Data Factory pipelines Microsoft Learn

2. Use query referencing inside the Dataflow (critical for your Dimensions case)

Since your 19 Dimensions tables all come from the same Excel file, the Mashup Engine is fetching that file 19 times once per output table. Instead, create a single base query that loads the Excel file, then derive your 19 table queries from it by referencing that base query. SharePoint is then hit only once.

Base query → [Excel file from SharePoint]

  • Table_Sheet1 (references Base query, filters Sheet1)
  • Table_Sheet2 (references Base query, filters Sheet2)

→ Docs: Query folding and referencing in Dataflow Gen2 Microsoft Learn

In short: for your Dimensions dataflow, query referencing alone should drastically cut your CU consumption without changing your architecture. For Facts, the pipeline Copy → Lakehouse → Dataflow pattern is the cleanest long-term approach.

1. I'll test this approach

2. That's exactly my set up right now - I'm already using 'Reference query'. What you are saying is that if I did not do it this way, even move CUd were consumed?

Tamanchu
Continued Contributor
Continued Contributor

Hi @mrbartuss,

 

Yes, exactly and the saving is actually proportional to your table count.

Without query referencing (no "Reference" step):
Each of your 19 output tables would independently evaluate its own query all the way to the source. That means 19 separate HTTP requests to SharePoint + 19 full Excel file parses through the Mashup engine. Each one is billed at 12 CU/s for up to 10 minutes.

With query referencing (your current setup) :
The SharePoint read happens once (the base "Source" query), and all 19 output tables branch off that single evaluation. Only the filtering/column selection steps run 19 times which are near-instant in-memory operations.

So in your Dimensions dataflow specifically, query referencing could be saving you up to ~18× the cost of a full source re-read. You're already doing the right thing there.

The remaining optimization lever (point 1) would remove the SharePoint dependency entirely once the file is in OneLake as a Delta table, even that single source read becomes a local Delta scan instead of an HTTP call to SharePoint.

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2026

Check out the April 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Top Solution Authors
Top Kudoed Authors