Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
alondon
Frequent Visitor

Pipeline - Slow copy task

Hello - I'm running a copy task to pull in the last year of data from one table that exists in a DB2 on premise database.  The connection is on ODBC.   It's been processing now for about 6 hours without completing.  Is this normal?  Are there things I can do to enhance this performace?  There are a lot of records, but this data in SQL is much faster.   We have just started using pipelines to get data into our Lakehouse and are following the Medallion model.  Are there things I can do to enhance this performace?  Thanks!

5 REPLIES 5
alondon
Frequent Visitor

Any ideas from anyone?  The settings I noted didn't work.  This is new to me, so any advice is helpful.  Thanks!

alondon
Frequent Visitor

Hello @nilendraFabric 

 

Thanks for providing that information and the link to the document.  It was very helpful.  I tried what I thought I needed yesterday, but it took even longer to run and I had to finally cancel the process.  Today, I'm trying again with these settings in the copy task of my pipeline.  On the Destination tab, I have checked to Enable partitions and added a Partition column (Serial) which is the key field in this table.  On the Settings tab, I checked Use custom value for Intelligent throughput optimization and entered 12 for the Degree of copy parallelism.  Is this correct?

Using “Serial” (unique key) as partition column creates excessive small partitions. This leads to:
• High partition management overhead
• Wasted I/O from opening/closing numerous files
• Poor parallelization effectiveness

-- Use temporal partitioning instead of serial key
SELECT * FROM source_table
WHERE business_date BETWEEN '2024-03-01' AND '2025-03-04'

 

You are correct and that process didn't work.  I'm been using the Select * with a date from the beginning to get the last year of data.  Everyday I try something different in hopes something cuts down the process time.  Today I'm trying the ITO setting at Maximum and Degree of copy parallelism at Auto.  So far it's been processing for running for 6h 2m.   I wish I knew to majic settings!

nilendraFabric
Super User
Super User

Hello @alondon 

 

Use dynamic range partitioning with degree of parallelism adjusted to your DB2 capacity (start with 8-16 parallel copies).

 

https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases

 

 

 

 

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.