The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi,
I connected Power BI on many Azure SQL Server tables and created a model. Then I uploaded the model to the Premium capacity configured for incremental refresh. The first service refresh (wich load historic data) loads fine, but the next day incremental refresh is giving me an error of duplicate records on a table that is a dimension (and therefore has a lot of relationships 1 to *).
I've already duble checked the origin table and there isn't any duplicates records there. I checked my incremental rule and it's "cod_date < RangeEnd" too and I don't do any transformation on Power Query, I just conect to the table.
I wonder if it's something related with Power BI doesn't let us define a primary key when programing incremental refresh.
Any help will be nice.
Thanks!
Solved! Go to Solution.
Hi @enzo_garcia20 ,
According to the official document, the error causud by the Post dates have changed.
Detail: With a refresh operation, only data that has changed at the data source is refreshed in the dataset. As the data is divided by a date, it’s recommended post (transaction) dates are not changed.
If a date is changed accidentally, then two issues can occur: Users notice some totals changed in the historical data (that is not supposed to happen), or during a refresh an error is returned indicating a unique value is not in fact unique. For the latter, this can happen when the table with incremental refresh configured is used in a 1:N relationship with another table as the 1 side and should have unique values. When the data is changed (for a specific ID), that ID then appears in another partition and the engine will detect the value is not unique.
Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @enzo_garcia20 ,
According to the official document, the error causud by the Post dates have changed.
Detail: With a refresh operation, only data that has changed at the data source is refreshed in the dataset. As the data is divided by a date, it’s recommended post (transaction) dates are not changed.
If a date is changed accidentally, then two issues can occur: Users notice some totals changed in the historical data (that is not supposed to happen), or during a refresh an error is returned indicating a unique value is not in fact unique. For the latter, this can happen when the table with incremental refresh configured is used in a 1:N relationship with another table as the 1 side and should have unique values. When the data is changed (for a specific ID), that ID then appears in another partition and the engine will detect the value is not unique.
Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi,
Thanks for the answers guys and @v-eqin-msft.
I don't know if this is my case because none post dates were changed. But in the end, I rebuild my model from scratch and it worked flawless so maybe it was some kind of bug.
But I have one more question: in the documentation you sent it says "only data that has changed at the data source is refreshed in the dataset". How does Power BI know wich data changed or not if it doesn't ask me for a primary key when programing a refresh?
I programed my incremental refresh to evaluate two months back, so how does it know when to refresh an existing row ou add a new row?
If you can recommend me some documentation it will be awersome.
But thanks for all the support.
Best Regards,
Enzo.
@enzo_garcia20 PBI knows what data to refresh because it does a max(date_col) for every partition range. Trace your Sql (if Sql source) and you'll see it. So it collects the names of all partitions that had a change during your refresh period (last 2 months) with a timestamp since the last refresh. It then just does a complete reload of those partitions.
In other words, it doesn't need to know which row changed, only what partitions had changes. Then it wipes and out rebuilds those partitions. The incremental doesn't do rows at a time - it does entire partitions.
Do you have an incremental refresh on your dimension tables?
The query failing is right in that on a table where you have relationships on the 1 side of the relationship there will be duplicate records for some reason.
What happens when you refresh this PBIX file, do you get the same error?
It will be becasue of your records are getting updated beyond the delta range and because of that there are two recorss with same key in your dataset/dataflow.
To throublehsoot better, create Dataflow/dataset with only one dimension table and implemnet incremental load, validate for duplicay after loading the data (try after 1-2 delta load)
If this post helps, then please consider Accept it as the solution and give it a thumbs up
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
49 | |
26 | |
15 | |
14 | |
12 |
User | Count |
---|---|
110 | |
40 | |
25 | |
24 | |
19 |