Solved: incremental data refresh via data flow and merge q...

Jeanxyz · ‎10-14-2023

I have been struggling with this issue for days, wonder if anyone could help.

Background:

We have crated a power bi report using source data from Jira (via a custom PB connector). After publishing the report, it takes ~ 30 minutes to refresh the dataset, which is too long as users requests 48 data refreshes per day.

Solution implemented:

- I tried default incremental data refresh from Microsoft in a PPU workspace, which didn't work because the Date column used in the incremental refresh is not static.

- as an alternative, I tried the following approach

1. create a data flow, which is refreshed once a day (in the early morning). That data flow imports all jira data(fact_Jira Issue_all) and remains static for the rest of the day. I then referene the data flow in Power BI report, which means every time the Power BI dataset is refreshed, it will import the same full fact table from data flow.

2. I then created a filter query in Jira which filters in Jira issues that have been upddated/created in the past 24 hrs, connect the Jira query with Power BI Report, which means every time the Power BI dataset is refreshed, it will import the updated issues (fact_Jira_issue_new) from Jira into Power BI.

3. create a merge query in Power BI, which merge fact_Jira Issue_all with fact_Jira Issue_new based on issue IDs. If an issue exists in both tables, it gets removed.

4. append the table from step 3 with fact_jira issue_new to get a table of issues (Fact_Issues) has been updated/created in the past 24 hours plus old issues.

Result:

This solution works in 95% cases and refresh duration is significantly reduced, but in some cases I see an error in Refresh History saying:

Table 'Fact_Issues' contains a duplicate value '200369' and this is not allowed for columns on the one side of a many-to-one relationship

That means something went wrong in the merge/append step, causing one issue appears twice in the final table. I also know from experience that the duplicate issue is an issue that gets updated in Jira while the Power BI data refresh is taking place. So here is my guess of the root cause: "fact_Jira_issue_new" table is called twice in my query, once in the merge step, once in the append step. Because the way Power Query works, each time the table is called, the query that is used to generate "fact_Jira_issue_new" is refreshed, as a result, the "fact_Jira_issue_new" table used in merge step is different from append. I have tried Table.Buffer(#"fact_Jira_issue_new), but the duplicate error still occurs. Is there anyway to prevent "fact_Jira_issue_new" from refreshing again in the append step?

@lbendlin;@parry2k

parry2k · ‎10-17-2023

@Jeanxyz I think you found the root cause and here are a few suggestions:

- after you finished appending, remove duplicate on id (or whatever column it is used for relationship)

- 2nd, use 24-hour data pull in the dataflow and use that in Power BI for both merge and append.

Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤

Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

View solution in original post

parry2k · ‎10-17-2023

@Jeanxyz I think you found the root cause and here are a few suggestions:

- after you finished appending, remove duplicate on id (or whatever column it is used for relationship)

- 2nd, use 24-hour data pull in the dataflow and use that in Power BI for both merge and append.

Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤

Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Jeanxyz · ‎10-20-2023

The problem is solved after I apply sorting by last update date, and remove duplicates by Issue ID. Thanks!

lbendlin · ‎10-16-2023

which didn't work because the Date column used in the incremental refresh is not static.

please explain what you mean by that.

Have you considered using an incremental refresh dataset with the hot partition set to one day? That would remove the need for the dataflow and the merge.

incremental data refresh via data flow and merge query

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Join the #PBI10 DataViz contest