Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Pojzon
Frequent Visitor

Best approach for refreshing folders with pdf files

Hello!

I got a case where we have multiple folders with invoices in .pdf format since 2022 year, we store them in seperate folders for each month e.g. 2022.05 / 2022.06 and so on. The issue now is that it's getting slower and slower to refresh PBI report using the way we do it now, which is:

-we store everything at SharePoint and we connect to it via PowerQuery sharepoint folder option

-then we do some cleaning, data transform, keeping columns that we need and so on

-we add new invoices to folder, all of it get's synchronized and PowerQuery takes care of updating it, all fine

Is this the correct/best approach to do something like that?

The issue is that folders with old invoices are never being changed, there is no point in refreshing data there and this is just making our query slower when in reality it only should check like newest folder for data, e.g. last 3 months.

Is there anyway to achieve something like that? What would be the best solution here?

Thank you!

5 REPLIES 5
Omid_Motamedise
Memorable Member
Memorable Member

Use one of the below: 

1. Incremental Refresh in Power BI
Power BI offers an Incremental Refresh feature, which allows you to refresh only the data that has changed or been added in recent periods (e.g., the last 3 months), rather than reloading all data every time. This would be the most efficient approach since it avoids querying old, unchanging data.

Steps:


2. Archiving Old Invoices
If old invoices are rarely accessed and do not need to be refreshed, you can:

Archive old folders in a separate SharePoint site/library or offline storage to keep the current data set lean.
Power Query can then target only the current folders without having to read through all historical folders.

v-heq-msft
Community Support
Community Support

Hi @Pojzon ,
Thanks for lbendlin reply.
Here's what I need to add: first, Power BI provides the ability to incrementally refresh only the data that has changed. This can significantly reduce the time it takes to refresh a dataset. You can set up an incremental refresh policy in Power BI Desktop and then publish the report to the Power BI service. Second, consider partitioning your data by creating separate queries for different time periods. For example, you could create one query for the current year and another for historical data. This way, you only need to refresh the current year's data periodically, while the historical data remains unchanged. Of course, you can also add a filtering step in Power Query to load only the most recent data. For example, you could filter the data to include only the last three months. This will reduce the amount of data processed and speed up the refresh time.

Best regards,
Albert He


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

lbendlin
Super User
Super User

Load the older years into dataflows. Refresh these only as needed.  Then in your Power Query append these dataflows with the data from the current year folders.

Thank you, I got another question. In dataflow should I only set up connect to Sharepoint folder and then continue in PowerBI desktop -> connect to dataflow there -> continue with data transformation? Or perhaps do all transformations in dataflow, connect to it in PBI and only append tables old (dataflow) + new data? What's better optimize wise?

Depends on the severity of the transforms. You will have to try out the various options and then decide which one is most efficient.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.