Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
smoqt
Frequent Visitor

Speed of writing to Fabric Data Warehouse from Fabric Pipeline vs ADF Pipeline

I need to move a large amount of data from CSVs stored in Azure Blob Storage to a Microsoft Fabric Data Warehouse.

 

Initially I was planning to use ADF to do this because I may need to use the Tumbling Window Triggers (which to my knowledge are not yet available in Fabric).

 

Doing so has proven to be quite slow.  These files are stored with varying paths within the container, thus requiring the use of wildcard paths to access them.  When attempting to use wildcard paths in ADF, I am required to enable staging on the Copy Data activity.  If I don't, it fails the pipeline validation.

smoqt_0-1714166444322.png

With staging enabled, it takes a significant amount of time to load the data.  For example, loading just 2.6 GB took roughly 2.5 hours.

 

If I were to recreate and run the same pipeline from Microsoft Fabric, would it be faster?  Is there a better way?

10 REPLIES 10
v-nikhilan-msft
Community Support
Community Support

Hi @smoqt 
The internal team replied as follows:
Moving 2.6 GB files in general should: be much faster than 2.5 hours. Then only difference that impacts the throughput between Fabric and ADF would be in Fabric you can levage the native workspace staging storage while in ADF you have to bind it to an Azure storage.

Hope this helps. Please let me know if you have any further questions.

Thank you.  I will continue testing.

Hi @smoqt 

Please do let me know if you have any further questions. 

I wanted to share another example where this time I did not receive throttling errors.  Writing 123 MB took only 4 min to write to blob storage but 2.5 hours to write to Fabric.

smoqt_1-1714570065576.png

 

 

Hi @smoqt 
Can you please provide the run id's for the above two pipelines?

Thanks

Example 1:

Pipeline Run ID:  ddbd9f52-da90-43f3-afe1-aa1458b65f59

Activity Run ID: c9f1a5e0-0ff3-4402-b317-ef23111b505c

 

Example 2:

Pipeline Run ID:  31a31bab-42df-4b36-9777-c83ebb758b52

Activity Run ID: dc622eca-8676-4a13-a120-ca1fa962c2e1

 

Thanks for the details @smoqt 
The internal team is looking into the issue. Meanwhile you can try this as adviced by the team:

If the source has too many small files, the loading to Warehouse copy command will be significantly slow. You can always use a separate copy job with 'Copy Behavior' = 'merge files' to merge small files into one single large file, then the performance will be better.

Hope this helps. Please let me know if you have any further questions.

Thanks, @v-nikhilan-msft.  I will test.

Here is an example from today. 

 

Currently at 1hr and 12 min for 152 MB. 

 

I see the throttling errors and a significant difference between the performance when writing to Azure Blob Storage vs when writing to Fabric Warehouse.

 

I am using the default settings for Maximum DIU and  Degree of Copy Parallelism ("Auto"). 

 

I will research how to mitigate throttling errors, however I'm curious if the throttling is happening strictly on the Azure side or on the Fabric side.


smoqt_0-1714501912821.png

 

smoqt_1-1714502007059.png

v-nikhilan-msft
Community Support
Community Support

Hi @smoqt 
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks 

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors