Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
DennesTorres
Impactful Individual
Impactful Individual

Dataflow performance and more

Hi,

I did some performance comparison of dataflows execuiton and I'm not sure if I completely understand the results.

 

The data to be loaded was this: https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorldImportersDW/tables/fact_sale.p...

 

First execution

 

Loading to a data warehouse

Adding 3 calculated fields

 

Execution time: 58m49sec

 

Second Execution

 

Loading to a data warehouse

No calculation

 

Execution time: 39m59sec

 

Conclusion: Simple calculations in a big amount of data can take 19 minutes to run

 

Third Execution

 

Disable Staging

Load to a lakehouse

Execution Time: 13m12s

 

Conclusion: The fact it's faster without staging I understand. The fact the data warehouse doesn't work as a destination without staging I don't. How to explain to someone that if he chooses a data warehouse, the dataflows gen 2 will be 26 minutes slower because the operations can't be done in memory when the target is a data warehouse? 

Is there any improvement planned on this ?

Fourth Execution

 

Data Pipeline to a lakehouse

 

Execution time: 16m 58s

 

Conclusion: I don't know how to explain the difference between a pipeline and a dataflow gen 2 without staging. Are there some configurations I should be checking?

 

Fifth Execution

 

Data pipeline to a data warehouse

 

Execution Time: 4m17s

 

Conclusion: The data warehouse is way faster than the lakehouse for data ingestion? Why this power can't be used on dataflows gen 2?

 

Sixth Execution

COPY INTO in the data warehouse

Execution time: 1m32sec

 

Conclusion: I'm not sure where to start about this last one. So, Polaris has all this power, but we can't use any of this for data ingestion (dataflows/pipelines) ?


All these differences makes the scenario a bit difficult to choose when to use each one of the solutions. We may end up choosing according the technical limitations related to data transformations and having to accept the performance loss when changing from one solution to another.

Am I missing something? Are there additional guidelines in relation to this?

Kind Regards,

 

Dennes

1 REPLY 1
Anonymous
Not applicable

Hello @DennesTorres ,

Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this .
We will update you once we hear back from them.

Helpful resources

Announcements
August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors