Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hello. I'm trying to understand the behavior of Power BI's incremental refresh functionality. I'm not a Power BI developer but I have a decent understanding of the basics.
Here's the setup in the report:
Here's the scenario I'm trying to figure out. Let's say we've done the full initial load and it has pulled in all the available data.
Yesterday, a record with an original last_updated_date of 1/1/2024 was updated to 4/7/2025 which now falls within my refresh window.
Since the original record has a date that is not with my last 7 days, is Power BI going to duplicate this record? My understanding is that Power BI deletes whatever is in the defined date range and pulls in fresh data. Therefore, I should assume my row will be duplicated?
Based on what I have found online I think yes, but I've been unable to test my theory.
Solved! Go to Solution.
Parameters for range start and range end applied to the last_updated_date in the table
Let me stop you right there. last_updated_date is not a suitable column for incremental refresh partition policies. This will result in data duplication across partitions. Instead use an immutable column like created_date.
Thank you for the response. I guess my question is more simple than this. If we specify the range on the created_date and it's set to pull 7 days, the query that is fired off to the source is going to look for anything with a created_date in the last 7 days?
No. There will be (at least) seven queries against the data source, each for a single day partition.
Ok so 7 (at least) queries for each day. Would that still be querying by created_date? I assume that to be the case since we haven't told it about any other date column. I think I'm rounding out my knowledge with your answers so thank you for your knowledge and support!
Ok so 7 (at least) queries for each day.
Not 7 queries for each day. 7 queries, one for each day
Make sure RangeStart is inclusive an RangeEnd is exclusive - that will avoid duplicate data across partitions.
Parameters for range start and range end applied to the last_updated_date in the table
Let me stop you right there. last_updated_date is not a suitable column for incremental refresh partition policies. This will result in data duplication across partitions. Instead use an immutable column like created_date.
LOL ok that is what I thought and that is what I tried to tell the BI team.
This leads me to my next question: if the parameters are applied to the created_date instead and the incremental refresh is still set to pull in 7 days and the detect data changes is unchecked, will power BI pull the data from the source by created_date?
What I can't figure out is how does Power BI know which data to replace? There's no key column defined so it can't be performing a merge.
There is no merge in any of this. You have partitions. Their queries are managed by the RangeStart and RangeEnd parameters which are controlled by the Power BI service based on your requirements*.
They are then appended.
* I shoud say "guidance. You may specify 7 days but the service will decide for itself how many partitions to actually refresh, which ones to consolidate etc.
https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-xmla