Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I have a large dataset, and many of the tables have incremental refresh policies. The data source is Amazon Redshift, and I'm using the Amazon Redshift connector.
I just modified the dataset, and then published a new version to a workspace in my Premium Capacity. Then I refreshed the dataset, which triggered the initial "full refresh." During this full refresh, I monitored the queries that were being executed against my Amazon Redshift data source.
I noticed something very strange: For each table X that has an incremental refresh, many times Power BI executes a simple query like "SELECT TOP 1000 ... FROM X" (with no WHERE clause).
For some tables, that identical query was executed a few dozen times. For other tables, that identical query was executed nearly 200 times! Often, several such identical queries are executed simultaneously, many times in a row, within milliseconds of each other and with no other queries in between. In total, during this one "full refresh," a few thousand such queries were executed. This all seems wasteful, and I don't see any purpose.
Meanwhile, my "full refresh" takes several hours. So I am wondering if this "SELECT TOP 1000" thing is a bug that might be fixed? Because if it stopped happening, that would be thousands fewer queries that have to be performed, which might significantly reduce my "full refresh" time.
Why does this happen? If it is not necessary, can it please be stopped?
Many thanks in advance for your help!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.