March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Doing Incremental Refresh in Premium
I have a last update date in my data. When I detect data changes set to on, it checks the row and if this date has changed then the record is updated.
However how does it know that the record in Power BI and the record coming in is the same record?
I have searched the documentation and cant find anything on this. Im assuming its the Primary key of the record or the key ? However this isnt set anywhere tht I can see
What about when you are loading in the fact table? The fact table contains foreign keys. Basically how doesthe incremental refresh know that this is the row that replaces the row in Power BI?
https://docs.microsoft.com/en-us/power-bi/service-premium-incremental-refresh#detect-data-changes Doesnt seem to answer this question
Would incremental refresh work in the following scenario?
Data as of March 2nd 2020:
Date | Key | Total Sales | Last Modified |
March 1st 2020 | 1 | 300 | March 1st 2020 |
March 2nd 2020 | 2 | 200 | March 2nd 2020 |
Data has changed to the following as of March 3rd 2020
Date | Key | Total Sales | Date Modified |
March 1st 2020 | 1 | 400 | March 3rd 2020 |
March 2nd 2020 | 2 | 600 | March 3rd 2020 |
i.e. Does incremental refresh work if updates are only made against the same row rather than as NEW ROWS?
Thanks GilbertQ
So what does this mean on the MS support article for incremental refresh? Reason I asked was I thought it had something to do with my example above
"The filter on the date column is used to dynamically partition the data into ranges in the Power BI service. Incremental refresh isn't designed to support cases where the filtered date column is updated in the source system. An update is interpreted as an insertion and a deletion, not an actual update. If the deletion occurs in the historical range and not the incremental range, it won’t get picked up. This can cause data refresh failures due to partition-key conflicts."
Hi Gilbert
I don't understand what you mean sorry - duplicate values where?
Thank you
Before I look at this apologies. I put 2019 rather than 2020. Its still habit at the moment
Jay Swaminarayan!
This case was exteremely helpful to understand the underlying mechanism of Microsoft PowerBI of Detect Changes with Incremental Refresh and How it works?
The question:
How does PowerBI detect changes know, what records to update based on LastUpdateDate without taking Id or Key columns into Account?
Note: the following answer is based on my understanding about it.
With this example
Assume these Refresh Policy Configuration:
Historical Days: 60 Months
Refresh Days: 10 Days
Range Column : OrderDate
Detect Changes: LastUpdateDate
Date of Upload 25/01/2020 (chanaged for better understanding)
In Power BI already
------------------------------Original Case---------------------
Key ID Surname OrderDate lastUpdatedate
1 24 Smyth 15/01/2020 15/01/2020
2 48 Jonis 15/01/2020 15/01/2020
3 67 Ellis 15/01/2020 15/01/2020
4 81 Truman 15/01/2020 15/01/2020
And you upload this data
Key ID Surname OrderDate lastUpdatedate
1 24 Smith 15/01/2020 27/09/2019
2 48 Jones 15/01/2020 27/09/2019
3 67 Ellis 15/01/2020 15/01/2020
4 81 Truman 15/01/2020 15/01/2020
-------------------------------------------------------------------
For sake of better understanding we add few more rows
------------------------------Original Case---------------------
Key ID Surname OrderDate lastUpdatedate
1 24 Smyth 15/01/2019
2 48 Jonis 15/01/2019
3 67 Ellis 15/02/2019
4 81 Truman 15/01/2020 15/01/2020
5 54 NSmyth 15/01/2020 15/01/2020
6 58 NJonis 16/01/2020 16/01/2020
7 57 NEllis 16/01/2020 16/01/2020
8 51 NTruman 18/01/2020 18/01/2020
And we are intending to update this data
Key ID Surname OrderDate lastUpdatedate
1 24 Smith 15/01/2019 24/01/2019
2 48 Jonis 15/01/2019 15/01/2019
3 67 EllisU 15/02/2019 23/02/2019
4 81 Truman 15/01/2020 15/01/2020
5 54 NSmyth 15/01/2020 24/01/2020
6 58 NJonis 16/01/2020 23/01/2020
7 57 NEllis 16/01/2020 16/01/2020
8 51 NTruman 18/01/2020 18/01/2020
-------------------------------------------------------------------
Records in REDs are updated on respective dates.
Now lets understand how PowerBI makes History and Refresh Partitions
Historical Partitions based on Months, it must create 60 partitions for each month. Therefore our data will sit this way.
Like....
Key ID Surname OrderDate lastUpdatedate
-----------------------------HISTORY-P1------------------------
1 24 Smyth 15/01/2019 15/01/2019
2 48 Jonis 15/01/2019 15/01/2019
-----------------------------HISTORY-P2------------------------
3 67 Ellis 15/02/2019 15/02/2019
-----------------------------HISTORY-P3------------------------
4 81 Truman 15/01/2020 15/01/2020
5 54 NSmyth 15/01/2020 15/01/2020
Refresh Partitions are created based on granuality, which is days (10 days). Therefore it will create 10 Refresh Partition for each day.
Like....
-----------------------------REFRESH-P1------------------------
6 58 NJonis 16/01/2020 16/01/2020
7 57 NEllis 16/01/2020 16/01/2020
-----------------------------REFRESH-P2------------------------
8 51 NTruman 18/01/2020 18/01/2020
Note here that
Detect changes only works with Refresh Range, therefore only Refresh Partitions are updated and the History partitions are just IGNORED. And thus any data updated in the historic periods are not considered at all.
PowerBI maintains the last MAX Updated Value for each Refresh Partitions
REFRESH P1- MAX Last Updated Date = 16/01/2020
REFRESH P2- MAX Last Updated Date = 18/01/2020
Now Refresh Operation takes place and gets following data
Key ID Surname OrderDate lastUpdatedate
1 24 Smith 15/01/2019 24/01/2019
2 48 Jonis 15/01/2019 15/01/2019
3 67 EllisU 15/02/2019 23/02/2019
4 81 Truman 15/01/2020 15/01/2020
5 54 NSmyth 15/01/2020 24/01/2020
--------------^^^IGNORED as its in HISTORY PARTITION^^^^------------
-----------------------------REFRESH-P1------------------------
6 58 NJonis 16/01/2020 23/01/2020
7 57 NEllis 16/01/2020 16/01/2020
-----------------------------REFRESH-P2------------------------
8 51 NTruman 18/01/2020 18/01/2020
Now, PowerBI checks the MAX LastUpdateDate for incoming data
For REFRESH-P1 : New: 23/01/2020 (> Old: 16/01/2020)
For REFRESH-P2 : New: 18/01/2020 (== Old: 18/01/2020)
Thus, It has DETECTED Changes in REFRESH-P1 partition, and not in REFRESH-P2.
Therefore all the records will be deleted from REFRESH-P1 partitions (Id 6 & 7 both will be deleted) and Insterted back with the new records (of Id 6 & 7).
It will NOT only delete Id(6) but both as it resides in same partition.
And REFRESH-P2 will be ignored completely.
This is how PowerBI detect record changes based on LastUpdateDate without using ID/Key columns as it is replacing whole partition.
Hope that makes sense.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
25 | |
21 | |
11 | |
10 | |
9 |
User | Count |
---|---|
48 | |
30 | |
18 | |
17 | |
15 |