Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I am looking for any suggestions to shorten the import time of one large table, given that in the near term, I'll be using a single node gateway and importing a binary file is not an option.
The table has about 53 million rows and 97 columns of various data types (datetime, decimal, varchar, boolean, smallint, tinyint, int). The destination is a lakehouse. Currently, I am using 5 degrees of copy parallelism, balanced intelligent throughput optimization and dynamic range partition option (the primary key is the partition column, with specified lower and upper bounds).
Are there settings, properties, etc. that I should change? Suggestions? Thanks in advance.
Solved! Go to Solution.
Hi @ebjim
If you are importing this into a semantic model, you could look at incremental refresh, create partitions, review data types, only include columns needed (used for relationships, analysis, measures), verify if you have auto date/time turned off if not being used (this can potentially create very large hidden date time tables for each date time column they are importing), etc.
Is this historical or incremental load or truncate and load every time? If this is incremental, copy data to files section as LH (sink) does not support merge operation today. A copy activity followed by notebook or Dataflow gen2 is an option.
Hope this helps. Please let me know if you have any further queries.
It's impossible to suggest anything without more information. Eg what are the gateway VM specs? what is the CPU, Disk, and Network utilization percentage? What what is the table DDL? Is there a bottleneck on SQL Server?
We're using the latest Power BI Gateway version, but I don't have the resource stats available. No bottlenecks with the SQL Server as far as I know. I am more concerned with getting the parallelism and throughput optimization settings correct. As for DDL, I can send it to you separately if you are open to that.
Hi @ebjim
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks
Hi @ebjim
If you are importing this into a semantic model, you could look at incremental refresh, create partitions, review data types, only include columns needed (used for relationships, analysis, measures), verify if you have auto date/time turned off if not being used (this can potentially create very large hidden date time tables for each date time column they are importing), etc.
Is this historical or incremental load or truncate and load every time? If this is incremental, copy data to files section as LH (sink) does not support merge operation today. A copy activity followed by notebook or Dataflow gen2 is an option.
Hope this helps. Please let me know if you have any further queries.