Solved: Dataflow Gen2 and Data Pipeline Consumes more CUs ...

Nathan_Mosher · ‎02-28-2025

I've read in many places that data ingestion with the new Fabric items (such as Dataflow Gen2 and Pipelines) is more efficient than the traditional Dataflow Gen1 method. I was wondering if anyone has seen an improvement after switching to Gen2. However, I'm starting to suspect that Microsoft is penalizing CU usage for new Fabric items, as I’ve noticed our CU consumption increasing as we migrate to Fabric.

To explore this further, I set up a small experiment. In my testing, I created seven independent test cases, each ingesting the same data from the same endpoint using different methods. I then compared them against the legacy Dataflow Gen1 as a baseline. While I’m still conducting further testing, my preliminary results indicate that CU consumption is 5 to 9 times higher, depending on the method used.

For clarity, I'm using Dataflow Gen2 CICD items, while Pipelines perform a single copy action per table. The OData endpoint downloads 18 flat tables of varying dimensions. Warehouse/Lakehouse staging CU usage is excluded, as I couldn't differentiate between them, but it appears that each test utilizing staging consumed approximately 700 CUs. No transformations are being performed—this is strictly raw data ingestion. Additionally, I have not yet tested FastCopy, as the documentation states it is only beneficial for big data.

Now, here’s an interesting finding: I also tested chaining Dataflow Gen1 to Dataflow Gen2 before writing to storage—and believe it or not, this approach consumes fewer CUs than using Dataflow Gen2 directly. The refresh time is also comparable, with Gen1 averaging slightly faster execution. The results show that the most efficient Dataflow Gen2 configuration averages 28.7k CUs, while the most efficient Pipeline configuration averages 24.6k CUs. However, a Gen1-to-Gen2 chained approach averages 18k CUs (5k from Gen1 + 13k from Gen2)—a 40% savings.

Is anyone else experiencing similar results? I would appreciate other experiences when converting Dataflow Gen1's to Dataflow Gen2's or Data Pipelines for ingestion.

Results:

Test Case Items:

ItemName	ItemType	TestCase	Destination	Step
A2025_01_A_DFG1	Dataflow Gen1	01		A
A2025_01_C_DFG1	Dataflow Gen1	01		C
A2025_01_D_DFG1	Dataset	01		D
A2025_02_A_DFG2	Dataflow Gen2 CICD	02		A
A2025_02_C_DFG1	Dataflow Gen1	02		C
A2025_02_D_DS	Dataset	02		D
A2025_03_A_DFG2	Dataflow Gen2 CICD	03	A2025_03_B_LH	A
A2025_03_B_LH	Lakehouse	03		B
A2025_03_C_DFG1	Dataflow Gen1	03		C
A2025_03_D_DS	Dataset	03		D
A2025_04_A_DFG2	Dataflow Gen2 CICD	04	A2025_04_B_WH	A
A2025_04_B_WH	Warehouse	04		B
A2025_04_C_DFG1	Dataflow Gen1	04		C
A2025_04_D_DS	Dataset	04		D
A2025_05_A_PL	Data Pipeline	05	A2025_05_B_LH	A
A2025_05_B_LH	Lakehouse	05		B
A2025_05_C_DFG1	Dataflow Gen1	05		C
A2025_05_D_DS	Dataset	05		D
A2025_06_A_PL	Data Pipeline	06	A2025_06_B_WH	A
A2025_06_B_WH	Warehouse	06		B
A2025_06_C_DFG1	Dataflow Gen1	06		C
A2025_06_D_DS	Dataset	06		D
A2025_07_A_PL	Data Pipeline	07	A2025_07_B_LH	A
A2025_07_B_LH	Lakehouse	07		B
A2025_07_C_DFG1	Dataflow Gen1	07		C
A2025_07_D_DS	Dataset	07		D

Anonymous · ‎03-05-2025

Hi @Nathan_Mosher,

Thank you for reaching out to Microsoft Fabric Community Forum.

Its true that, Dataflow Gen2 and Data Pipelines generally consume more CUs than Dataflow Gen1.
Dataflow Gen1 consumes fewer CUs due to its sequential execution, lazy evaluation, and minimal staging, processing only necessary data on demand. In contrast, Gen2 utilizes parallel processing, batch execution, and persistent staging, leading to higher CU consumption. Additionally, metadata tracking and lineage features in Gen2 add further overhead, making it more resource-intensive but scalable.

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!

Regards,
Vinay Pabbu

View solution in original post

andrewsommer · ‎03-11-2025

Brunner BI did a good breakdown about the differences between Gen 1 and Gen 2 dataflows. Basically, you are trading speed for compute.

en.brunner.bi/post/comparing-cost-of-dataflows-gen1-vs-gen2-in-power-bi-and-fabric-1

Please mark this post as solution if it helps you. Appreciate Kudos.

Anonymous · ‎03-11-2025

Hi @Nathan_Mosher,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for the issue worked? or Let us know if you need any further assistance?
If our response addressed, please mark it as Accept as solution and click Yes if you found it helpful.

Regards,
Vinay Pabbu

Anonymous · ‎03-17-2025

Hi @Nathan_Mosher,

May I ask if you have gotten this issue resolved?

If it is solved, please mark the helpful reply or share your solution and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster.

Regards,
Vinay Pabbu

Anonymous · ‎03-20-2025

Hi @Nathan_Mosher,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for the issue worked? or Let us know if you need any further assistance?
If our response addressed, please mark it as Accept as solution and click Yes if you found it helpful.

Regards,
Vinay Pabbu

Anonymous · ‎03-05-2025

Hi @Nathan_Mosher,

Thank you for reaching out to Microsoft Fabric Community Forum.

Its true that, Dataflow Gen2 and Data Pipelines generally consume more CUs than Dataflow Gen1.
Dataflow Gen1 consumes fewer CUs due to its sequential execution, lazy evaluation, and minimal staging, processing only necessary data on demand. In contrast, Gen2 utilizes parallel processing, batch execution, and persistent staging, leading to higher CU consumption. Additionally, metadata tracking and lineage features in Gen2 add further overhead, making it more resource-intensive but scalable.

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!

Regards,
Vinay Pabbu