Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
Anonymous
Not applicable

Incremental refresh -Detect Data Change column vs Partition Column

Hi everyone,

 

We have PBI Premium and I am attemping to have one of our reports use Import from the SQL Azure data warehouse and incrementally load 2 of the main tables. I've followed the documentaiton, but I am having some issues. Is there an easy way to troubleshoot or test what is actually refreshing and/or what the errors are?

 

One thing I noticed here: https://docs.microsoft.com/en-us/power-bi/service-premium-incremental-refresh#detect-data-changes

 

Which says do not use the same column for the "Detect Data Changes" as the column you use against the RangeStart and RangeEnd parameters. We do have an auditing/process date/time column on all DW tables which indicates when it was loaded. My thought is to simply use this for the Detect Data Changes column, but then I don't know what to use for the partition column for RangeStart/RangeEnd parameters. We really only need to see the last DW load time in the process date/time column.

 

All advice is appreciated! Thanks.

4 REPLIES 4
Anonymous
Not applicable

Even I have the same scenario. If I do not use the recird updated column (Date Time) in detect changes, it  is not working as expected. So is it ok, if I use a single column for both query parameters and detect changes ??Any help can be greatly appreciated

gmelhaff
Advocate I
Advocate I

Yes the documentation says to use 2 different columns, one for the incremental refresh and another for the query filter parameters.  However, what actually happens if you watch the sql it generates is the incremental refresh column (e.g. modified time) gets used only for detecting if there are changes to pickup in the time period you specify (such as the last 1 day).  It then totally ignores that column when it comes to actually pulling data.  You read that right - it's ignored after that. 

 

Case in point:

I have an order date on a fact table that I want to partition on (but not incremental refresh since other things can change other than the order date) so I put the order date as the parameter filter column.  I have an audit timestamp I want to use for incremental refresh so I put that in the incremental properties for detecting changes. 

But when an incremental refresh runs it does the following 2 queries:

1) "select max([rows].[audit_tms])  from....subquery.  It's just finding out if there's something to update.  The incremental refresh column is not used again for querying data

2) Select the data based on the parameter query filter for that time period.  In this case the order date.  (It goes into the partition that aligns with this column's data). 

 

End result:   I leak 100% of any changes that do not cause the order date to change.  

 

If however you use the audit timestamp for both incremental refresh AND the query filters (against their advice) then you won't leak data BUT you will end up with a few bloated partitions.  If you do complete reload then you will end up with only current partition with 100% of your data.  Over time that will start to spread out...a little.

 

So you have 2 choices:  

1) Leak data changes (if you follow their advice)

2) Lose most of the performance advantage of partitioning

 

Bottom line is this product's architecture if fundamentally flawed.  It seems to me that they built the tool with the assumption that you only insert and not update.  Or if you do update you also update the business date.  I think there may be some cases you have facts where this is true but I'm thinking that's fairly rare.  Where I work nothing works this way - everything can be updated often due to re-alignment of keys such as customer assignments (due to merges, householding, etc) or simply an enhancement that requires a reload. 

 

What Microsoft should have done is allow you to pull based on one column but partition based on an alternate static business column.  I'm just baffled they didn't design it that way.  

v-lili6-msft
Community Support
Community Support

hi, @Anonymous 

RangeStart and RangeEnd parameters

For incremental refresh, datasets are filtered by using Power Query date/time parameters with the reserved, case-sensitive names RangeStart and RangeEnd. These parameters are used to filter the data imported into Power BI Desktop, and also to dynamically partition the data into ranges once published to the Power BI service. The parameter values are substituted by the service to filter for each partition. Once published, the parameter values are overridden automatically by the Power BI service. There's no need to set them in dataset settings in the service. Once published, the parameter values are overridden automatically by the Power BI service.

 

For your case, you need to set the RangeStart and RangeEnd parameters for process date/time column, and then use auditing date/time column for "Detect Data Changes".

 

here are documents for you refer to:

https://community.powerbi.com/t5/Community-Blog/Important-Considerations-for-Setting-up-Incremental-Refresh-on/ba-p/554151

https://www.nabler.com/articles/Power-BI-data-refresh-and-scheduling-3.asp

 

Best Regards,

Lin

 

 

 

 

 

Community Support Team _ Lin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi, 

 

Thank you for the reply. The issue is that my main fact table only has the process date/time column - this is the audting column which is updated in SQL each time the data is loaded. I do not have any other date/time columns in the fact table. I believe I should use this as the Detect Data Changes column, since it will always have a single date of the last load, but I do not have any other date time column to use for RangeStart and RangeEnd. How do I handle this situation? The documentation says not to use the process date/time column for both.

 

Mark

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.