Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
Redriver123
Frequent Visitor

Large Dataset - Refresh only YTD

Hello,

I have created Dataflow A with historical data and disabled refresh, around 20 mil rows.

Also created Dataflow B with YTD, Incremental refresh, takes about 2 minutes to refresh.

In report, I disable refresh of Dataflow A and APPEND to Dataflow B so to have one master dataset.

During refresh, both dataflows fireoff to refresh. Takes about 30 mins to refresh.

How do we efficiently setup large dataset/dataflows (to be used for multiple reports) and where we don't have to refresh historical portion on daily basis?

 

1 ACCEPTED SOLUTION
v-pagayam-msft
Community Support
Community Support

Hi @Redriver123 ,
Thank you for reaching out to Microsoft Fabric Community Forum regarding the issue you are facing.

Sorry for the delay in response. If the issue is still not resolved, Providing the workaround which might assist you in resolving the issue.
1.Create a dataflow A for historical data and Dataflow B for YTD data. By separating historical and incremental data, avoid refreshing large volumes of unchanged data, saving time and resources.
2.Then,in Power Bi Desktop load load historical and incremental dataflow into the Power Query Editor.
3.Later create parameters for RangeStart and RangeEnd to define the incremental refresh range and close and apply the changes.
4.Publish the report to power bi service and configure the incremental refresh for the dataset.

Please feel free to contact us if you have any further questions. If the response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

View solution in original post

9 REPLIES 9
Poojara_D12
Super User
Super User

Hi @Redriver123 

When working with large datasets in Power BI Dataflows—especially scenarios involving a large historical dataset alongside a smaller, frequently updated incremental dataset—the goal is to optimize refresh performance and avoid unnecessary processing. In your case, you have Dataflow A holding the historical data (about 20 million rows) with refresh disabled, and Dataflow B for Year-To-Date (YTD) incremental data that refreshes quickly. However, when appending these two in a report or combined Dataflow, both dataflows end up refreshing, causing long refresh times.

To efficiently handle this, it’s best to separate the historical and incremental dataflows clearly and avoid appending them inside Power BI or Dataflow at refresh time, which can trigger both to refresh. Instead, maintain Dataflow A as a static, rarely refreshed data source. For Dataflow B, enable incremental refresh properly so only the new or changed data is processed daily. Then, in your report or dataset, combine these two sources by merging or appending the data at the query or model level without forcing a refresh of the historical dataflow. Another common approach is to materialize the combined dataset in a Power BI dataset or a data warehouse/table that aggregates historical and incremental data outside the dataflow refresh process, thereby decoupling the refresh schedules.

If using Power BI Premium or Fabric capacities, consider leveraging large dataset storage or incremental refresh policies at the dataset level rather than in dataflows. This allows the historical data to remain stable while incremental data updates occur efficiently, reducing refresh time drastically. Overall, the key is to design the refresh architecture so that the large static data is refreshed infrequently (e.g., monthly or quarterly), and only the incremental portion refreshes daily, avoiding unnecessary refresh cascades and improving overall performance for all reports relying on that master dataset.

 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

Thank you for such thoughtful response. I am happy to see someone actually read my post.
1) " in your report or dataset, combine these two sources by merging or appending the data at the query or model level without forcing a refresh of the historical dataflow."

That's the core of the problem. When these are combined in power query, this kicks off the full refresh process of importing the historical data from (non refreshing) dataflow.

Unless you suggest using a DAX to create a virtual table for UNION both historical and incremental refresh. Maybe, this could work, but I would imagine performance issues.

 

2) Another common approach is to materialize the combined dataset in a Power BI dataset or a data warehouse/table that aggregates historical and incremental data outside the dataflow refresh process, thereby decoupling the refresh schedules."

This would defeat the whole purpose of using incremental refresh feature in PBI. Also, setting this up in datawarehouse would force the import process of all 20 million rows. Which we do not want.

v-pagayam-msft
Community Support
Community Support

Hi Redriver123 ,
Could you please confirm if the issue has been resolvedon your end? If a solution has been found, it would be greatly appreciated if you could share your insights with the community. This would be helpful for other members who may encounter similar issues.
Thank you for your understanding and assistance.

Regards,
Pallavi G.

v-pagayam-msft
Community Support
Community Support

Hi @Redriver123 ,
I wanted to check in on your situation regarding the issue. Have you resolved it? If you have, please consider marking the reply that helped you as Accept as Solution or sharing your solution. It would be greatly appreciated by others in the community who may have the same question.

Thank you for being a valued member of the Microsoft Fabric Community Forum!

v-pagayam-msft
Community Support
Community Support

Hi @Redriver123 ,
Thank you for reaching out to Microsoft Fabric Community Forum regarding the issue you are facing.

Sorry for the delay in response. If the issue is still not resolved, Providing the workaround which might assist you in resolving the issue.
1.Create a dataflow A for historical data and Dataflow B for YTD data. By separating historical and incremental data, avoid refreshing large volumes of unchanged data, saving time and resources.
2.Then,in Power Bi Desktop load load historical and incremental dataflow into the Power Query Editor.
3.Later create parameters for RangeStart and RangeEnd to define the incremental refresh range and close and apply the changes.
4.Publish the report to power bi service and configure the incremental refresh for the dataset.

Please feel free to contact us if you have any further questions. If the response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

pallavi_r
Super User
Super User

Hi @Redriver123 ,

 

What is the size of your data in gb?

And how many years of data you are pulling?

Any specific reason you are not using incremental refresh on the entire dataset and provide the incremental range. So if you set it from 1 year, it will refresh only for YTD. And previous years it will not refresh.

pallavi_r_0-1736835850150.png

 

If this post helps, please accept this as a solution. Appreciate your kudos.

 

Thanks,

Pallavi

The Report model size is about 728 MB with 2 import dataflows (historical and current YTD with incremental refresh). Then I append 2 dataflows into a master table in the report. Historical dataflow has 3 years of data about 24 mil records. 

Report takes 20 mins to refresh, but YTD dataflow itself takes only 1 min to refresh in the cloud. So I am trying to find a way not to refresh last 3 years (dataflow with history), but seems the process is still re-importing History even when I have disabled "refresh" at dataflow and report level.

Reason I am splitting incremental dataflow into 2 is because I want to avoid re-import of historical last 3 years of data every single refresh. 

Hope this was not too confusing.

This is a similar post someone else added which matches the problem. No solution appears to exist.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.