Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Jeanxyz
Impactful Individual
Impactful Individual

incremental data refresh via data flow and merge query

I have been struggling with this issue for days, wonder if anyone could help. 

 

Background:

We have crated a power bi report using source data from Jira (via a custom PB connector). After publishing the report, it takes ~ 30 minutes to refresh the dataset, which is too long as users requests 48 data refreshes per day. 

 

Solution implemented:

- I tried default incremental data refresh from Microsoft in a PPU workspace, which didn't work because the Date column used in the incremental refresh is not static. 

- as an alternative, I tried the following approach

1. create a data flow, which is refreshed once a day (in the early morning). That data flow imports all jira data(fact_Jira Issue_all) and remains static for the rest of the day.  I then referene the data flow in Power BI report, which means every time the Power BI dataset is refreshed, it will import the same full fact table from data flow.

2. I then created a filter query in Jira which filters in Jira issues that have been upddated/created in the past 24 hrs, connect the Jira query with Power BI Report, which means every time the Power BI dataset is refreshed, it will import the updated issues (fact_Jira_issue_new)  from Jira into Power BI. 

3. create a merge query in Power BI, which merge fact_Jira Issue_all with fact_Jira Issue_new based on issue IDs. If an issue exists in both tables, it gets removed. 

4. append the table from step 3 with fact_jira issue_new to get a table of issues (Fact_Issues) has been updated/created in the past 24 hours plus old issues. 

 

Result:

This solution works in 95% cases and refresh duration is significantly reduced, but in some cases I see an error in Refresh History saying:

Table 'Fact_Issues' contains a duplicate value '200369' and this is not allowed for columns on the one side of a many-to-one relationship

 

That means something went wrong in the merge/append step, causing one issue appears twice in the final table. I also know from experience that the duplicate issue is  an issue that gets updated in Jira while the Power BI data refresh is taking place. So here is my guess of the root cause: "fact_Jira_issue_new" table is called twice in my query, once in the merge step, once in the append step. Because the way Power Query works, each time the table is called, the query that is used to generate "fact_Jira_issue_new" is refreshed, as a result, the "fact_Jira_issue_new" table used in merge step is different from append. I have tried Table.Buffer(#"fact_Jira_issue_new), but the duplicate error still occurs. Is there anyway to prevent "fact_Jira_issue_new" from refreshing again in the append step?

 

@lbendlin;@parry2k 

 

 

 

1 ACCEPTED SOLUTION
parry2k
Super User
Super User

@Jeanxyz I think you found the root cause and here are a few suggestions:

 

- after you finished appending, remove duplicate on id (or whatever column it is used for relationship)

- 2nd, use 24-hour data pull in the dataflow and use that in Power BI for both merge and append.

 

 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

View solution in original post

3 REPLIES 3
parry2k
Super User
Super User

@Jeanxyz I think you found the root cause and here are a few suggestions:

 

- after you finished appending, remove duplicate on id (or whatever column it is used for relationship)

- 2nd, use 24-hour data pull in the dataflow and use that in Power BI for both merge and append.

 

 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Jeanxyz
Impactful Individual
Impactful Individual

The problem is solved after I apply sorting by last update date, and remove duplicates by Issue ID. Thanks!

lbendlin
Super User
Super User

which didn't work because the Date column used in the incremental refresh is not static. 

please explain what you mean by that.

 

Have you considered using an incremental refresh dataset with the hot partition set to one day?  That would remove the need for the dataflow and the merge.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.