Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
Nathan_Mosher
Frequent Visitor

Dataflow Gen2 using 5x to 9x more CU than Gen1

We've been working to convert many of our Dataflow Gen1's into Dataflow Gen2's, and we hit an issue. I'm wondering if anyone else is experience this issue.

 

Dataflow Gen2's and even Data Pipelines (datafactory) is using 5x to 9x more CU's that the same Dataflow Gen1. I did a test for having just standalone Gen2's, Gen2's that load to lakehouse, warehouse. I tested Pipelines that copy data to lakehouse, warehouse, and with/without staging. On the lowest end, I'm experiencing over a 5x multiplier on CU usage. When staging is on, it is up to 9x as much usage.

 

Right now, our Dataflows consume about 10% of our premium capacity. If we were to convert our dataflows, then it would consume 60% of our capacity just on the initial data ingestion, so I'm wondering how others are accomplishing this.

 

On my experiment, I am just pulling from an Odata Source without any transformation whatsover. It is just ingestion. For the Pipelines, I am only using the copy data activity and landing it straight to the warehouse/lakehouse.

 

Here are the results of my testing:

Nathan_Mosher_1-1740855222827.png

 

 

 

1 ACCEPTED SOLUTION
nilendraFabric
Community Champion
Community Champion

Hello @Nathan_Mosher 

Your observation is absolutely correct and has been reported multiple times. 

this is a very good report and article about this behaviour 

 

https://www.fourmoo.com/2025/01/09/copying-sql-server-on-premises-data-in-microsoft-fabric-which-one...

 

 

Dataflow Gen2 typically refreshes 40% faster than Gen1 but consumes 55-700% more CU depending on configuration:
• Basic OData ingestion: Gen1 averages 78 CU vs. Gen2’s 550 CU (7x higher)
• 15GB text processing: Gen2 consumes 55% more CU than Gen1 despite faster refreshes
• Pipeline Copy Activity: 360 CU (4.6x cheaper than Gen2 for SQL Server ingestion)

Unlike Gen1’s single-engine design, Gen2 combines three billing components:

 

Total CU = (Mashup_duration_hr × 16) + (SQL_duration_hr × 6) + (FastCopy_duration_hr × 1.5)

Enabling staging (default in Gen2) forces data through both engines:
• Test case: Grouping 1M rows took 24s without staging vs. 40s with staging

 

Recommendation:


• Use Gen1 for legacy workflows & small datasets (<1M rows)
• Reserve Gen2 for AI-enhanced transformations or Lakehouse integration

 

if this is helpful please accept the answer 

 

View solution in original post

10 REPLIES 10
nilendraFabric
Community Champion
Community Champion

Hello @Nathan_Mosher 

Your observation is absolutely correct and has been reported multiple times. 

this is a very good report and article about this behaviour 

 

https://www.fourmoo.com/2025/01/09/copying-sql-server-on-premises-data-in-microsoft-fabric-which-one...

 

 

Dataflow Gen2 typically refreshes 40% faster than Gen1 but consumes 55-700% more CU depending on configuration:
• Basic OData ingestion: Gen1 averages 78 CU vs. Gen2’s 550 CU (7x higher)
• 15GB text processing: Gen2 consumes 55% more CU than Gen1 despite faster refreshes
• Pipeline Copy Activity: 360 CU (4.6x cheaper than Gen2 for SQL Server ingestion)

Unlike Gen1’s single-engine design, Gen2 combines three billing components:

 

Total CU = (Mashup_duration_hr × 16) + (SQL_duration_hr × 6) + (FastCopy_duration_hr × 1.5)

Enabling staging (default in Gen2) forces data through both engines:
• Test case: Grouping 1M rows took 24s without staging vs. 40s with staging

 

Recommendation:


• Use Gen1 for legacy workflows & small datasets (<1M rows)
• Reserve Gen2 for AI-enhanced transformations or Lakehouse integration

 

if this is helpful please accept the answer 

 

So this appears to be the solution as well as the reason why it was less CUs to run the Dataflow Gen1 chained with the Dataflow Gen2.

The Dataflow Gen1 is much cheaper to operate. The Dataflow Gen2 is calculated by how long it takes to operate. So essentially, when I chained it with the Gen1, the data was becomming local in the workspace and could run much quicker. As the Gen2 is calculated as a function of its duration, it drove the overall cost down.

This is something to consider when retreiving from a slow source.

In my opinion, the price seems a bit steep, but this checks out. However, the pipeline calculation does not. I've raised a support ticket with Microsoft as it doesn't appear to align with the formulas posted on their documentation.

@nilendraFabric ,

 

Thank you for this. This is very useful information. Unfortunately, in all my tests, the Gen1 ran faster than the Gen2s and pipelines. Regarding SQL endpoint, I had tested comparing ingesting from a SQL endpoint to compare. I created a Datamart with the same tables ingested there first, then compared the following:

 

Dataflow Gen1: 1,300 CU (duration: 20 seconds)

Dataflow Gen2 to Lakehouse: 12,000 CU (duration: 56 seconds)

Dataflow Gen2 Standalone: 8,500 CU (duration: 45 seconds)

 

With what you mentioned about pipelines, I can go back and test the pipeline to see if that brings it down to more comparable levels. I'll update this reply when I have the data.

 

EDIT:

Pipeline (staging off): 10,080 CU (duration 107 seconds)

Hi @Nathan_Mosher,

Could you please confirm if the issue has been resolved. If a solution has been found, it would be greatly appreciated if you could share your insights with the community. This would be helpful for other members who may encounter similar issues.

Thank you for your understanding and assistance.

Hi @Nathan_Mosher,

 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

 

Thank you.

I think Gilbert's point is valid in that a DF Gen1 by itself is useless. It still needs to be ingested into a semantic model - and that consumes additional CUs.

 

So all these tests should include the load of the data into a semantic model in order to be truly comparable.

@lbendlin ,

Good point. If you look at the attached screenshot, I also compared pulling the data into a dataset (semantic model) and another dataflow. I was both looking at downstream impact as well as the CU consumption of accessing the data.

 

There will be CU savings in the Semantic model, that that is minimal as compared to the orders of magnitude increases on the engestion side.

 

Something of note is that the Dataflow Gen1 does not cost any CU's when the data is accessed, while the lakehouse does consume CU's anytime the data is downloaded.

Something of note is that the Dataflow Gen1 does not cost any CU's when the data is accessed,

I hope you are not referring to the abomination that is DirectQuery against Dataflows.  For regular scenarios (a semantic model refreshing from a dataflow in import mode) CU cost will be incurred.

I can say with 100% certainty,(and I just validated it now). When a dataset or any downstream user utilizes the dataflow Gen1 data (not direct query), it does not consume CUs against the dataflow Gen1.

Nathan_Mosher_0-1741016448981.png

 

Agreed. But for simple usecases and less data Gen1 will 100% consume less CU in any scenario when compared to Gen2.

Helpful resources

Announcements
May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

May 2025 Monthly Update

Fabric Community Update - May 2025

Find out what's new and trending in the Fabric community.