The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello - I'm running a copy task to pull in the last year of data from one table that exists in a DB2 on premise database. The connection is on ODBC. It's been processing now for about 6 hours without completing. Is this normal? Are there things I can do to enhance this performace? There are a lot of records, but this data in SQL is much faster. We have just started using pipelines to get data into our Lakehouse and are following the Medallion model. Are there things I can do to enhance this performace? Thanks!
Any ideas from anyone? The settings I noted didn't work. This is new to me, so any advice is helpful. Thanks!
Hello @nilendraFabric
Thanks for providing that information and the link to the document. It was very helpful. I tried what I thought I needed yesterday, but it took even longer to run and I had to finally cancel the process. Today, I'm trying again with these settings in the copy task of my pipeline. On the Destination tab, I have checked to Enable partitions and added a Partition column (Serial) which is the key field in this table. On the Settings tab, I checked Use custom value for Intelligent throughput optimization and entered 12 for the Degree of copy parallelism. Is this correct?
Using “Serial” (unique key) as partition column creates excessive small partitions. This leads to:
• High partition management overhead
• Wasted I/O from opening/closing numerous files
• Poor parallelization effectiveness
-- Use temporal partitioning instead of serial key
SELECT * FROM source_table
WHERE business_date BETWEEN '2024-03-01' AND '2025-03-04'
You are correct and that process didn't work. I'm been using the Select * with a date from the beginning to get the last year of data. Everyday I try something different in hopes something cuts down the process time. Today I'm trying the ITO setting at Maximum and Degree of copy parallelism at Auto. So far it's been processing for running for 6h 2m. I wish I knew to majic settings!
Hello @alondon
Use dynamic range partitioning with degree of parallelism adjusted to your DB2 capacity (start with 8-16 parallel copies).
https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |
User | Count |
---|---|
5 | |
4 | |
3 | |
2 | |
2 |