Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Dayna
Helper V
Helper V

Incremental refresh causing duplicate rows

Hello,

 

I've implemented incremental refreshes on one of our fact tables, called 'Transactions'.

 

I can see on a given day there's been a change to some rows, and when this dataflow is consumed within the desktop, I'm receiving duplicate rows but the 'modified' and 'created' dates are different. One shows 30/01/2022 the other is 01/02/2022.

 

My incremental refresh is configured to store data for the last two years, and refresh rows for the last three months based on 'created' date, and automatically detect changes is enabled and set to watch 'modified' date.

 

I know if I do a manual refresh, it'll likely fix the issue as it'll do a normal load from the report, but I don't want to do this, I'd rather incremental worked as I would hope.

 

When I query the row within the service, only the one row is returned - I add this step after the 'canary load' step at the end of all the steps on the workspace.

 

Can anyone assist, or is this expected behaviour?

 

Many thanks,

Dayna

21 REPLIES 21
Anonymous
Not applicable

Hi I have a similar issue 

we have last updated date field across which we are enabling incremental refresh 

I followed as per MS documentation 

currently we enabled refresh for last 15 days.

so what's happening is it is updating and refreshing the data for these 15 days window. 
now problem here is if I have a record which last updated 4 months back and is updated today again, it is not removing the 4 months back row and is getting added resulting in duplicates 

how can we adress this issue ??

 

That field is not suitable for incremental refresh. Use an immutable field like Create Date. You can theoretically use the Last Modified Date for change tracking but ideally you should know your data and should know which partitions to refresh when.

Anonymous
Not applicable

The moment I use Created Date for incremental refresh, when i publish to service, the frist refresh goes on for hours and gets timed out and fails. I tried that couple of times. 

Read about bootstrapping.  You can prepare the partitions without filling them.  Then fill them one by one.

gustavofreitas
New Member

Hi Dyana,

 

My solution for this case was make a buffer for the sorted set by the modified date (descending), and then remove the duplicates.

   ...

   #"Linhas classificadas" = Table.Buffer ( Table.Sort(#"Colunas removidas", {{"Id", Order.Ascending}, {"Modificado", Order.Descending}})),

    #"Duplicatas removidas" = Table.Distinct(#"Linhas classificadas", {"Id"}) ...
 
Obs.: without applying the buffer, power bi does not guarantee that the first row of sorting will be kept.
Dayna
Helper V
Helper V

Hello,

I'm not seeing duplicates in the service though, only when consumed within the report. Is this expected behaviour for the incremental refresh?

Thanks,
Dayna

Can you confirm that your M query is shaped according to the rules I mentioned?

I've not configured this other than in the UI within the PowerBI service, it has automatically created this:

Table.SelectRows(#"Changed column type 2", each DateTime.From([pro2created]) >= RangeStart and DateTime.From([pro2created]) < RangeEnd)

That looks accurate. How do you know that records are duplicated?

When I load in this dataflow into a report, my numbers for a specific day are inflated. When digging deeper, I can see multiple records for the same ID, which should be impossible as it's unique. 

 

When I consume the dataflow in a new report, it's duplicated when filtering on one of these ID codes. Doing the same in the service though only brings back one result, which is strange... 

hmorrow
Frequent Visitor

Did you get a resolution on this? I'm having the same issue when setting incremental refresh on Power BI Online premium workspace. Any insight you can share would be helpful. Thank you.

Sadly not, I ended up disabling the incremental data refreshes. 

hmorrow
Frequent Visitor

That's too bad. Only way around I found is to remove duplicates and keep the most recent when bringing in the dataflow to a dataset. However, that's not the best option since you'd have you do that for each report you build... 

lbendlin
Super User
Super User

Duplication can only happen when partitions store overlapping date ranges. You must make sure that of the RangeStart and RangeEnd parameters one is inclusive and one is exclusive

 

For example 

 

Date >=RangeStart and Date < RangeEnd

Date >RangeStart and Date <=RangeEnd

 

are both possible (but not at the same time!)

 

However

 

Date >=RangeStart and Date <=RangeEnd

 

will lead to duplicate rows and must be avoided.

Great solution, worked for me. Bookmarked this page if I ever forget it.

This worked for me, thank you!

Jeanxyz
Impactful Individual
Impactful Individual

I have a similar issue. The data refreshing  failed because Power BI service identified duplicate Issue IDs (which is not allowed in data model because issue ID is my primary key column). When I checked the source data (Jira system), there is no dulicate I have set up the filter query as below:

Last_Update_Date >=RangeStart and Last_Update_Date < RangeEnd

 

My guess is during the incremental data refreshing, someone has updated an issue in Jira system causing the Last_Update_Date to be changed during the incremental refreshing process . As a result, this issue is included both in archive data set and refresh data set, causing the same issue to be loaded twice. 

 

Not sure if this is the root cause and if there is a solution to the issue.

 

@lbendlin 

 

This scenario is described in the documentation. Keep in mind that Incremental Refresh expects immutable data.  If your data changes after the fact (which would technically require a differential refresh) then you need to supplement the incremental refresh with an occasional Full refresh of all partitions.

Jeanxyz
Impactful Individual
Impactful Individual

I read the documentations, indeed I need to enable 'get the latest data with direct query' in refresh setup page. This is not practical because 1) I will need to update my PB license to a premium per user one and the same for all users who need access to the Power BI report on service, 2) I need to change the storage mode of relevant tables into 'direct query' mode, this also means I need to rewrite all the DAX measures from scratch. 

 

I'm wondering if there is any other solution. My only need is to shorten the data refreshing cycle from 25 minutes to 5-10 minutes. I'm connecting Power BI to Jira data using a odata connector. 

Your data source is not compatible with Microsoft's implementation of Incremental refresh.  Run a process outside of Power BI (for example on SQL Server) that can properly handle data updates. You can then connect to that SQL server in Direct Query mode for close to real time performance.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.