The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Question on power bi incremental refresh. Have this setup and having an issue when running incremental refresh from the service: non folded TOP 1000 queries equal to the number of partitions chosen for refresh are being generated. After these run, it executes the expected "folded" queries for each partition. Both non folded and folded queries are generated both upon initial refresh and all subsequent refreshes (i.e., after the BI partitions have been created). Also of note, all works as expetcted from the desktop - one "singular" query is folded with a where clause between the RangeStart and RangeEnd parameters. With the SQL source I can also view the native query from PBI. Even stranger, I tried the exact same setup with dataflows (a bit different since dataflow auto builds the parameters etc.) and it worked as expected (i.e., all generated queries are folded including w/ data change detection so if no data has changed the refresh runs very fast even when selecting a large number of partitions for refresh). Please let me know if you have any ideas. I am 99.9% sure this is not how it's supposed to work - would not provide any efficiency gain if non-folded queries are executed for each partition every time the refresh runs! Thanks!
Hi @jtclabaugh ,
not sure you ever found a solution to this, so want to inform you that we had the same problem with an incremental update. It seems like the SELECT TOP 1000 happens if you have a join (Merge Queries) in the dataset you want to load incremental.
/ACH
hi @jtclabaugh
The reason it runs the top 1000 is because it first validates that the data that is going to be returned matches the table definition.
Otherwise it would have to run the entire query only to fail at the end.
This top 1000 query should be quick, and it will only be run once for each partition created.
Hmmm. This is not going to work. #1, it's running more than one preview query per parition when running the incremental (looks like one for each power query step whether these are folded depends on the step). #2, this is only happening with incremental, for loads without incremental turned on, it issues no preview queries. #3, it does not do this with incremental for dataflows. #4, these are not fast, the datasource has 100's of millions of records - folded queries run fast, select * w/ top 1000 do not. #5, I need to go back and check 100+history partitions each time I run the load as historical data can change. Going to try Table.View with OnTake argument, but not sure if this will work running from the service. Can't imagine this is the way incremental was designed to work.
Hey jtclabaugh,
Did you find out more about this issue?
Also, how did your try with Table.view with Ontake argument work out?
We're experiencing the exact same issue, at the time being, and I am curious as to whether you found a solution.
/Chris
Hi @jtclabaugh
If I understand what you are doing, when the dataset gets created it creates the partitions in the background. you will not have to manage the partitions.
Unless you set your refreshing of the data to multiple days/months it will then have to re-run that data as per the refresh policy?
Let me try and better explain the problem: I have incremental refresh setup - source is SQL (in this case SQL against a databricks spark table). When I run from the desktop, I confirm that 1 *folded* query is run with a where clause that selects AR document date/time >= RangeStart and < RangeEnd. This works perfectly - very efficient. However, when I run from the service to build the partitions (and for subsequent incremental runs), it "works", but an insane number of non-folded limit 1000 queries are fired against the source (note that I also get the folded queries). These take a long time to run and it looks like more are being sent because I have a large number of partitions. From everything I have seen, this is NOT what it's supposed to do. I am expecting to see only folded queries + the select max timestamp for the data change detection. Any ideas?
seems like it's issuing data preview queries like you would get in power query. only happens when running processing for partitions from the service (i.e., tables that aren't partitioned work fine with only the expected folded queries). Here's a visual:
From what I can tell, it's running a data preview query for every step in the power query script * the number of partitions. So if # of partitions is large, an insane number of these queries are run - and the source is big data/ADLS Gen 2 with spark on top
I'm also interested in a solution.
Since the timestamps are included in the index of the SQL table the incremental refresh query performing well.
The prior query (data preview) during incremental refresh from PBI service is very slow due to select top 1000 instruction
@jtclabaugh Did you find a solution for this. We are running PBI over big data much like yourself. We have large PBI models in premium capacity with incremental refresh configured. We are also using the "Detect Data Changes" feature in hope of speeding up the daily refresh. The top 1000 queries are really affecting our refresh tuning. Did you find anyway around this issue?
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
51 | |
26 | |
14 | |
14 | |
12 |
User | Count |
---|---|
108 | |
39 | |
24 | |
23 | |
19 |