March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi all,
recently I've reached the limits of what our gateway likes in terms of dataload each morning (internal service errors). Though mainly to do with parallel loading of tables and a max concurrency threshold, I've also started exploring the options of incremental refreshes. I have one question in particular though, that I cannot really answer, as I'm working with PRO licences and I have no XMLA access (no options to explore partitions whatsoever really when it comes to incremental refreshes). It's a little like playing in the dark. You know you've done something, but what exactly did you do?
So my question is, when I have multiply tables that I load with different incremental schedules, what happens to my multiple subqueries in terms of queryfolding? To which schedule does which subquery adhere?
In reality I'm using an ETL process, (extracting, transforming and loading) to satisfy the needs of the privacy firewall, but to keep things simple, say I have:
1. a subquery that extracts stock data from a local SQL database, where I'm using the RangeStart and RangeEnd parameters to filter the data (I can see query folding happening in the native query - however, it just shows me the parameter placeholder values in PBI Desktop). I'm not loading this query to my report.
2. another subquery that extracts sales history from another table in the same database, also leveraging the RangeStart and RangeEnd parameters, also using query folding. I'm not loading this query to report either.
3. a table that I am loading to report with stock history (again using the RangeStart and RangeEnd filters), with an incremental schedule like: an incremental update period of 7 days (today included, no additional options), and a historical retention period of 2 years (and no realtime DQ, as I'm working with PRO).
4. another table that I'm loading to report with sales history (also using the RangeStart and RangeEnd filters), however (and this is the important part), with a different incremental schedule, not with a historical retention period of 2 years, but of 7 years instead. And of course, an incremental period of 7 days and no DQ realtime loading.
So, what I want to know is, what happens in the subqueries, which both utilise the (same?) parameters RangeStart and RangeEnd? Which data would get extracted from the SQL database? 7 years of data? 2 years of data? Or 7 years for sales and 2 years for stock?
Thank you in advance!
Solved! Go to Solution.
Hi @C4YNelis ,
Since each table's incremental refresh policy is independent, the subqueries will extract data according to the specific `RangeStart` and `RangeEnd` parameters set for each table. Therefore:
- The stock data subquery will extract data for the last 2 years.
- The sales history subquery will extract data for the last 7 years.
In summary, the data extracted from your SQL database will be 2 years for stock data and 7 years for sales history, based on the respective incremental refresh policies.
Best Regards,
Neeko Tang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @C4YNelis ,
Since each table's incremental refresh policy is independent, the subqueries will extract data according to the specific `RangeStart` and `RangeEnd` parameters set for each table. Therefore:
- The stock data subquery will extract data for the last 2 years.
- The sales history subquery will extract data for the last 7 years.
In summary, the data extracted from your SQL database will be 2 years for stock data and 7 years for sales history, based on the respective incremental refresh policies.
Best Regards,
Neeko Tang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @v-tangjie-msft ,
thank you for your quick answer. That is very nice indeed.
If you don't mind, what I'm still wondering about is: how does Power BI decide which subquery should utilise which incremental schema, when things are less straight forward. Does it use data lineage to track which data it would need to satisfy the needs of the reportqueries?
For instance, suppose:
- subquery 1 (for stockhistory) does nothing other than extract the data, which it is supposed to do for only 2 years of data (as little as possible), and
- the sales history subquery (2) will only extract data for sales, just 7 years worth of data, yet
- somewhere during the ETL process part of the stockdata subquery would be merged into a sales transformation subquery and
- the stock history has it's own straightforward transformation subquery. (see picture below).
Would Power BI now start extracting seven years worth of stock history data, instead of just two, or would this result possibly in blank rows on the sales history tabled loaded to report?
Thank you again for your time!
Cheers,
Niels
Hi @C4YNelis ,
The data is pre-processed in Power query and then loaded into PBI Desktop according to your choice, which is an irreversible process. You can then right-click incremental refresh to set incremental refresh on the table, and the retention and refresh of the data depends on your settings. You can see it in the diagram.
Regardless of whether table c merges tables A and table B, if table c is loaded into PBI Desktop, then it exists as a separate table. His refresh depends on your settings. For example, if you select Table C in the image, the incremental refresh policy of Table C depends on your input.
Best Regards,
Neeko Tang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @v-tangjie-msft ,
thank you for your clarification. My question is not so much about the report tables (the ones loaded to report), that much is clear to me.
I was specifically wondering how Power Query treats the subqueries. Since they all share the same parameters (at least, in name), yet there are multiple different refresh schedules, how does Power Query choose which schedule to apply to which subquery?
Like in your example, merging A and B into C, but A is also a source for table D (loaded to report). Suppose C refreshes incrementally retaining 7 years of data, with an incremental partition of one week, yet D only retains 1 year of data, but with an incremental partition of one month.
I'm assuming that in case of such a merge, Power BI will refresh as much data as needed. So subquery A will now also (initially) extract 7 years of data (basically, conform the maximum of any refresh schedule of their respective tables it would contribute data to) and one month incrementally?
Thank you again!
Cheers,
Niels
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
120 | |
77 | |
59 | |
53 | |
40 |
User | Count |
---|---|
193 | |
106 | |
88 | |
62 | |
51 |