Re: Pipeline - Slow copy task

alondon · ‎02-26-2025

Hello - I'm running a copy task to pull in the last year of data from one table that exists in a DB2 on premise database. The connection is on ODBC. It's been processing now for about 6 hours without completing. Is this normal? Are there things I can do to enhance this performace? There are a lot of records, but this data in SQL is much faster. We have just started using pipelines to get data into our Lakehouse and are following the Medallion model. Are there things I can do to enhance this performace? Thanks!

alondon · ‎03-04-2025

Any ideas from anyone? The settings I noted didn't work. This is new to me, so any advice is helpful. Thanks!

alondon · ‎02-28-2025

Hello @nilendraFabric

Thanks for providing that information and the link to the document. It was very helpful. I tried what I thought I needed yesterday, but it took even longer to run and I had to finally cancel the process. Today, I'm trying again with these settings in the copy task of my pipeline. On the Destination tab, I have checked to Enable partitions and added a Partition column (Serial) which is the key field in this table. On the Settings tab, I checked Use custom value for Intelligent throughput optimization and entered 12 for the Degree of copy parallelism. Is this correct?

nilendraFabric · ‎03-04-2025

Using “Serial” (unique key) as partition column creates excessive small partitions. This leads to:
• High partition management overhead
• Wasted I/O from opening/closing numerous files
• Poor parallelization effectiveness

-- Use temporal partitioning instead of serial key
SELECT * FROM source_table
WHERE business_date BETWEEN '2024-03-01' AND '2025-03-04'

alondon · ‎03-04-2025

You are correct and that process didn't work. I'm been using the Select * with a date from the beginning to get the last year of data. Everyday I try something different in hopes something cuts down the process time. Today I'm trying the ITO setting at Maximum and Degree of copy parallelism at Auto. So far it's been processing for running for 6h 2m. I wish I knew to majic settings!