Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
C4YNelis
Advocate III
Advocate III

How does Incremental refresh behave (in PRO) with multiple schedules in relation to subqueries

Hi all,

 

recently I've reached the limits of what our gateway likes in terms of dataload each morning (internal service errors). Though mainly to do with parallel loading of tables and a max concurrency threshold, I've also started exploring the options of incremental refreshes. I have one question in particular though, that I cannot really answer, as I'm working with PRO licences and I have no XMLA access (no options to explore partitions whatsoever really when it comes to incremental refreshes). It's a little like playing in the dark. You know you've done something, but what exactly did you do?

 

So my question is, when I have multiply tables that I load with different incremental schedules, what happens to my multiple subqueries in terms of queryfolding? To which schedule does which subquery adhere?

 

In reality I'm using an ETL process, (extracting, transforming and loading) to satisfy the needs of the privacy firewall, but to keep things simple, say I have:

1. a subquery that extracts stock data from a local SQL database, where I'm using the RangeStart and RangeEnd parameters to filter the data (I can see query folding happening in the native query - however, it just shows me the parameter placeholder values in PBI Desktop). I'm not loading this query to my report.

2. another subquery that extracts sales history from another table in the same database, also leveraging the RangeStart and RangeEnd parameters, also using query folding. I'm not loading this query to report either.

3. a table that I am loading to report with stock history (again using the RangeStart and RangeEnd filters), with an incremental schedule like: an incremental update period of 7 days (today included, no additional options), and a historical retention period of 2 years (and no realtime DQ, as I'm working with PRO).

4. another table that I'm loading to report with sales history (also using the RangeStart and RangeEnd filters), however (and this is the important part), with a different incremental schedule, not with a historical retention period of 2 years, but of 7 years instead. And of course, an incremental period of 7 days and no DQ realtime loading.

 

So, what I want to know is, what happens in the subqueries, which both utilise the (same?) parameters RangeStart and RangeEnd? Which data would get extracted from the SQL database? 7 years of data? 2 years of data? Or 7 years for sales and 2 years for stock?

 

Thank you in advance!

1 ACCEPTED SOLUTION
v-tangjie-msft
Community Support
Community Support

Hi @C4YNelis ,

 

Since each table's incremental refresh policy is independent, the subqueries will extract data according to the specific `RangeStart` and `RangeEnd` parameters set for each table. Therefore:
- The stock data subquery will extract data for the last 2 years.
- The sales history subquery will extract data for the last 7 years.

In summary, the data extracted from your SQL database will be 2 years for stock data and 7 years for sales history, based on the respective incremental refresh policies.

 

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

View solution in original post

4 REPLIES 4
v-tangjie-msft
Community Support
Community Support

Hi @C4YNelis ,

 

Since each table's incremental refresh policy is independent, the subqueries will extract data according to the specific `RangeStart` and `RangeEnd` parameters set for each table. Therefore:
- The stock data subquery will extract data for the last 2 years.
- The sales history subquery will extract data for the last 7 years.

In summary, the data extracted from your SQL database will be 2 years for stock data and 7 years for sales history, based on the respective incremental refresh policies.

 

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

Hi @v-tangjie-msft ,

 

thank you for your quick answer. That is very nice indeed.

 

If you don't mind, what I'm still wondering about is: how does Power BI decide which subquery should utilise which incremental schema, when things are less straight forward. Does it use data lineage to track which data it would need to satisfy the needs of the reportqueries?

 

For instance, suppose:

- subquery 1 (for stockhistory) does nothing other than extract the data, which it is supposed to do for only 2 years of data (as little as possible), and

- the sales history subquery (2) will only extract data for sales, just 7 years worth of data, yet

- somewhere during the ETL process part of the stockdata subquery would be merged into a sales transformation subquery and

- the stock history has it's own straightforward transformation subquery. (see picture below).

 

Would Power BI now start extracting seven years worth of stock history data, instead of just two, or would this result possibly in blank rows on the sales history tabled loaded to report?

 

How many years of data would the stock history subquery load?How many years of data would the stock history subquery load?

 

Thank you again for your time!

 

Cheers,

Niels

Hi @C4YNelis ,

 

The data is pre-processed in Power query and then loaded into PBI Desktop according to your choice, which is an irreversible process. You can then right-click incremental refresh to set incremental refresh on the table, and the retention and refresh of the data depends on your settings. You can see it in the diagram.

 

Regardless of whether table c merges tables A and table B, if table c is loaded into PBI Desktop, then it exists as a separate table. His refresh depends on your settings. For example, if you select Table C in the image, the incremental refresh policy of Table C depends on your input.

vtangjiemsft_0-1726639390968.png

 

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

Hi @v-tangjie-msft ,

 

thank you for your clarification. My question is not so much about the report tables (the ones loaded to report), that much is clear to me.

 

I was specifically wondering how Power Query treats the subqueries. Since they all share the same parameters (at least, in name), yet there are multiple different refresh schedules, how does Power Query choose which schedule to apply to which subquery?

 

Like in your example, merging A and B into C, but A is also a source for table D (loaded to report). Suppose C refreshes incrementally retaining 7 years of data, with an incremental partition of one week, yet D only retains 1 year of data, but with an incremental partition of one month

 

I'm assuming that in case of such a merge, Power BI will refresh as much data as needed. So subquery A will now also (initially) extract 7 years of data (basically, conform the maximum of any refresh schedule of their respective tables it would contribute data to) and one month incrementally?

 

Thank you again!

Cheers,

Niels

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.