The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello,
I've implemented incremental refreshes on one of our fact tables, called 'Transactions'.
I can see on a given day there's been a change to some rows, and when this dataflow is consumed within the desktop, I'm receiving duplicate rows but the 'modified' and 'created' dates are different. One shows 30/01/2022 the other is 01/02/2022.
My incremental refresh is configured to store data for the last two years, and refresh rows for the last three months based on 'created' date, and automatically detect changes is enabled and set to watch 'modified' date.
I know if I do a manual refresh, it'll likely fix the issue as it'll do a normal load from the report, but I don't want to do this, I'd rather incremental worked as I would hope.
When I query the row within the service, only the one row is returned - I add this step after the 'canary load' step at the end of all the steps on the workspace.
Can anyone assist, or is this expected behaviour?
Many thanks,
Dayna
Hi I have a similar issue
we have last updated date field across which we are enabling incremental refresh
I followed as per MS documentation
currently we enabled refresh for last 15 days.
so what's happening is it is updating and refreshing the data for these 15 days window.
now problem here is if I have a record which last updated 4 months back and is updated today again, it is not removing the 4 months back row and is getting added resulting in duplicates
how can we adress this issue ??
That field is not suitable for incremental refresh. Use an immutable field like Create Date. You can theoretically use the Last Modified Date for change tracking but ideally you should know your data and should know which partitions to refresh when.
The moment I use Created Date for incremental refresh, when i publish to service, the frist refresh goes on for hours and gets timed out and fails. I tried that couple of times.
Read about bootstrapping. You can prepare the partitions without filling them. Then fill them one by one.
Hi Dyana,
My solution for this case was make a buffer for the sorted set by the modified date (descending), and then remove the duplicates.
...
#"Linhas classificadas" = Table.Buffer ( Table.Sort(#"Colunas removidas", {{"Id", Order.Ascending}, {"Modificado", Order.Descending}})),
Hello,
I'm not seeing duplicates in the service though, only when consumed within the report. Is this expected behaviour for the incremental refresh?
Thanks,
Dayna
Can you confirm that your M query is shaped according to the rules I mentioned?
I've not configured this other than in the UI within the PowerBI service, it has automatically created this:
That looks accurate. How do you know that records are duplicated?
When I load in this dataflow into a report, my numbers for a specific day are inflated. When digging deeper, I can see multiple records for the same ID, which should be impossible as it's unique.
When I consume the dataflow in a new report, it's duplicated when filtering on one of these ID codes. Doing the same in the service though only brings back one result, which is strange...
Did you get a resolution on this? I'm having the same issue when setting incremental refresh on Power BI Online premium workspace. Any insight you can share would be helpful. Thank you.
Sadly not, I ended up disabling the incremental data refreshes.
That's too bad. Only way around I found is to remove duplicates and keep the most recent when bringing in the dataflow to a dataset. However, that's not the best option since you'd have you do that for each report you build...
Duplication can only happen when partitions store overlapping date ranges. You must make sure that of the RangeStart and RangeEnd parameters one is inclusive and one is exclusive
For example
Date >=RangeStart and Date < RangeEnd
Date >RangeStart and Date <=RangeEnd
are both possible (but not at the same time!)
However
Date >=RangeStart and Date <=RangeEnd
will lead to duplicate rows and must be avoided.
This worked for me, thank you!
I have a similar issue. The data refreshing failed because Power BI service identified duplicate Issue IDs (which is not allowed in data model because issue ID is my primary key column). When I checked the source data (Jira system), there is no dulicate I have set up the filter query as below:
Last_Update_Date >=RangeStart and Last_Update_Date < RangeEnd
My guess is during the incremental data refreshing, someone has updated an issue in Jira system causing the Last_Update_Date to be changed during the incremental refreshing process . As a result, this issue is included both in archive data set and refresh data set, causing the same issue to be loaded twice.
Not sure if this is the root cause and if there is a solution to the issue.
This scenario is described in the documentation. Keep in mind that Incremental Refresh expects immutable data. If your data changes after the fact (which would technically require a differential refresh) then you need to supplement the incremental refresh with an occasional Full refresh of all partitions.
I read the documentations, indeed I need to enable 'get the latest data with direct query' in refresh setup page. This is not practical because 1) I will need to update my PB license to a premium per user one and the same for all users who need access to the Power BI report on service, 2) I need to change the storage mode of relevant tables into 'direct query' mode, this also means I need to rewrite all the DAX measures from scratch.
I'm wondering if there is any other solution. My only need is to shorten the data refreshing cycle from 25 minutes to 5-10 minutes. I'm connecting Power BI to Jira data using a odata connector.
Your data source is not compatible with Microsoft's implementation of Incremental refresh. Run a process outside of Power BI (for example on SQL Server) that can properly handle data updates. You can then connect to that SQL server in Direct Query mode for close to real time performance.
Thank you - this was causing my issue, I never would have thought to look for this.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
76 | |
65 | |
52 | |
51 |
User | Count |
---|---|
128 | |
117 | |
78 | |
65 | |
63 |