Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hi All,
I have my Gen1 Dataflow containg monthly data from 2016-09. New Month's data load at 1st of each month containing that month's data. Size of table is too large, more than 21M records.
I want to go with incremental refresh, with settings like store 30years of data, refresh only past 2 day of data (cause each months contain only yyyy-MM-01).
With above confuguration, my dataflow is refreshing from past 13 hours, I know first load takes time but is this amout of time normal ?
Is there something wrong with my configuration settings that is causing this?
Appreaciate any advice/suggestions!!
Solved! Go to Solution.
If the data source for your incremental refresh does not fold then you will not see any performance improvements, you will only get the "free" partition management.
NOTE: The partition size limit is 10 GB. Your amount of data is not nearly large enough for incremental refresh to make much sense (unless your data source has timeout issues)
Hi @SamD02 ,
It's expected that the initial load for incremental refresh in Power BI Dataflow Gen1 can take a significant amount of time, especially with a large dataset like yours containing over 21 million records. Since you're storing 30 years of data and applying a refresh policy for only the past 2 days, the first refresh has to load and partition the entire historical dataset, which is resource-intensive and time-consuming. A 13-hour refresh duration for the first load, while long, may still be within the realm of normal depending on the complexity of your transformations, data source performance, and dataflow design.
However, to optimize performance, ensure that your queries are folded properly (i.e., query folding is applied), avoid unnecessary transformations before filters are applied, and consider breaking down the data into smaller, more manageable tables if possible. Once the initial load is done, subsequent refreshes should be significantly faster as only recent partitions will be refreshed.
Passionate about leveraging data analytics to drive strategic decision-making and foster business growth.
Connect with me on LinkedIn: Rohit Kumar.
Thankyou, @lbendlin, @rohit1991 , for your response.
Hi @SamD02,
We appreciate your inquiry through the Microsoft Fabric Community Forum.
In addition to the information provided by @lbendlin and @rohit1991 , please follow the steps below to optimize performance:
Adjust Historical Data Range.If your data begins from 2016, storing 30 years of history may create unnecessary overhead. It is advisable to reduce the stored rows value to 10 years to mitigate partition load and streamline metadata management.
Since your data consistently loads as yyyy-MM-01, consider increasing the refresh rows to 5 days (from the current 2 days) to provide a better buffer and to avoid missed partitions due to processing delays.
If feasible, consider upgrading to Dataflow Gen2, which leverages an enhanced compute engine for faster refreshes and improved scalability.
Additionally, for better insights into what might be slowing down the refresh, please utilize Power Query diagnostics or the Dataflow refresh history in the workspace.
Kindly refer to the following link for more information:
Dataflows best practices - Power BI | Microsoft Learn
If you find our response helpful, please mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.
Thank you.
Hi SamD02,
We are following up to see if your query has been resolved. Should you have identified a solution, we kindly request you to share it with the community to assist others facing similar issues.
If our response was helpful, please mark it as the accepted solution and provide kudos, as this helps the broader community.
Thank you.
Hi SamD02,
We wanted to check in regarding your query, as we have not heard back from you. If you have resolved the issue, sharing the solution with the community would be greatly appreciated and could help others encountering similar challenges.
If you found our response useful, kindly mark it as the accepted solution and provide kudos to guide other members.
Thank you.
Hi SamD02,
We have not received a response from you regarding the query and were following up to check if you have found a resolution. If you have identified a solution, we kindly request you to share it with the community, as it may be helpful to others facing a similar issue.
If you find the response helpful, please mark it as the accepted solution and provide kudos, as this will help other members with similar queries.
Thank you.
Thankyou, @lbendlin, @rohit1991 , for your response.
Hi @SamD02,
We appreciate your inquiry through the Microsoft Fabric Community Forum.
In addition to the information provided by @lbendlin and @rohit1991 , please follow the steps below to optimize performance:
Adjust Historical Data Range.If your data begins from 2016, storing 30 years of history may create unnecessary overhead. It is advisable to reduce the stored rows value to 10 years to mitigate partition load and streamline metadata management.
Since your data consistently loads as yyyy-MM-01, consider increasing the refresh rows to 5 days (from the current 2 days) to provide a better buffer and to avoid missed partitions due to processing delays.
If feasible, consider upgrading to Dataflow Gen2, which leverages an enhanced compute engine for faster refreshes and improved scalability.
Additionally, for better insights into what might be slowing down the refresh, please utilize Power Query diagnostics or the Dataflow refresh history in the workspace.
Kindly refer to the following link for more information:
Dataflows best practices - Power BI | Microsoft Learn
If you find our response helpful, please mark it as the accepted solution and provide kudos. This will assist other community members facing similar queries.
Thank you.
Hi @SamD02 ,
It's expected that the initial load for incremental refresh in Power BI Dataflow Gen1 can take a significant amount of time, especially with a large dataset like yours containing over 21 million records. Since you're storing 30 years of data and applying a refresh policy for only the past 2 days, the first refresh has to load and partition the entire historical dataset, which is resource-intensive and time-consuming. A 13-hour refresh duration for the first load, while long, may still be within the realm of normal depending on the complexity of your transformations, data source performance, and dataflow design.
However, to optimize performance, ensure that your queries are folded properly (i.e., query folding is applied), avoid unnecessary transformations before filters are applied, and consider breaking down the data into smaller, more manageable tables if possible. Once the initial load is done, subsequent refreshes should be significantly faster as only recent partitions will be refreshed.
Passionate about leveraging data analytics to drive strategic decision-making and foster business growth.
Connect with me on LinkedIn: Rohit Kumar.
If the data source for your incremental refresh does not fold then you will not see any performance improvements, you will only get the "free" partition management.
NOTE: The partition size limit is 10 GB. Your amount of data is not nearly large enough for incremental refresh to make much sense (unless your data source has timeout issues)
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
57 | |
28 | |
25 | |
22 | |
21 |
User | Count |
---|---|
63 | |
45 | |
24 | |
24 | |
18 |