March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello!
I got a case where we have multiple folders with invoices in .pdf format since 2022 year, we store them in seperate folders for each month e.g. 2022.05 / 2022.06 and so on. The issue now is that it's getting slower and slower to refresh PBI report using the way we do it now, which is:
-we store everything at SharePoint and we connect to it via PowerQuery sharepoint folder option
-then we do some cleaning, data transform, keeping columns that we need and so on
-we add new invoices to folder, all of it get's synchronized and PowerQuery takes care of updating it, all fine
Is this the correct/best approach to do something like that?
The issue is that folders with old invoices are never being changed, there is no point in refreshing data there and this is just making our query slower when in reality it only should check like newest folder for data, e.g. last 3 months.
Is there anyway to achieve something like that? What would be the best solution here?
Thank you!
Use one of the below:
1. Incremental Refresh in Power BI
Power BI offers an Incremental Refresh feature, which allows you to refresh only the data that has changed or been added in recent periods (e.g., the last 3 months), rather than reloading all data every time. This would be the most efficient approach since it avoids querying old, unchanging data.
Steps:
2. Archiving Old Invoices
If old invoices are rarely accessed and do not need to be refreshed, you can:
Archive old folders in a separate SharePoint site/library or offline storage to keep the current data set lean.
Power Query can then target only the current folders without having to read through all historical folders.
Hi @Pojzon ,
Thanks for lbendlin reply.
Here's what I need to add: first, Power BI provides the ability to incrementally refresh only the data that has changed. This can significantly reduce the time it takes to refresh a dataset. You can set up an incremental refresh policy in Power BI Desktop and then publish the report to the Power BI service. Second, consider partitioning your data by creating separate queries for different time periods. For example, you could create one query for the current year and another for historical data. This way, you only need to refresh the current year's data periodically, while the historical data remains unchanged. Of course, you can also add a filtering step in Power Query to load only the most recent data. For example, you could filter the data to include only the last three months. This will reduce the amount of data processed and speed up the refresh time.
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Load the older years into dataflows. Refresh these only as needed. Then in your Power Query append these dataflows with the data from the current year folders.
Depends on the severity of the transforms. You will have to try out the various options and then decide which one is most efficient.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
21 | |
16 | |
13 | |
12 | |
9 |
User | Count |
---|---|
36 | |
31 | |
20 | |
19 | |
17 |