Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
richard-powerbi
Post Patron
Post Patron

Can you use incremental refresh files with file properties?

I have a situation where users export Excel files from an engineering program. The amount of Excel files being created will be roughly 100-150 per year. I want to get this data in the data model and specifically in a Dataflow. Is it possible to design something in the Power BI environment in such a way that only new or modified Excel files will be included in the refresh? (Q1)

Let's say I combine Excel files through Power Query and I include the created/modified columns from the file properties in the resulting table, will an incremental refresh in the Dataflow settings cause only the newly created and modified files to be opened and analyzed? Or will all files be opened and read before the engine realizes only 5% of the data was new/modified? (Q2)

6 REPLIES 6
mwegener
Most Valuable Professional
Most Valuable Professional

Hi @richard-powerbi ,

 

Q1 - The incremental refresh does not detect any data changes. Only an ongoing period will be refreshed.

Q2 - I think you must have filtered the Excel files by date before combining them. That should work then.


Using incremental refresh with Power BI dataflows

https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-incremental-refresh

 

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast - Power BI Tutorials


@mwegener but if I filter the Excel files before combining them I remove the old data.... how is that a solution?

mwegener
Most Valuable Professional
Most Valuable Professional

Hi @richard-powerbi ,

 

the old data should already be stored in the Power BI Service (Dataflow).

The Power BI Service controls the parameters RangeStart and RangeEnd and thus filters the observation period for new data.

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast - Power BI Tutorials


@mwegener thanks I understand things partly. Can you confirm below reasoning and answer some of the questions within?

 

Let's say this is a query:

Step 1: Source

Step 2: Filtering source with RangeStart and RangeEnd

Step 3-8: All sorts of transformations

Step 9: Result table

 

This is the table at step 2:

Index    CreatedModified
11-1-2020    5-1-2020
22-1-20202-1-2020
33-1-20204-1-2020
44-1-20204-1-2020
55-1-20205-1-2020

 

Using RangeStart (4-1-2020) and RangeEnd (5-1-2020) only on the Created column anything after step 2 will look like this in PQ Editor at step 9:

Index    CreatedModified
44-1-2020    4-1-2020
55-1-20205-1-2020

 

At the same time it will still look like this in Dataflow (Dataflow removes the RangeStart and RangeEnd filters):

Index    CreatedModified
11-1-2020    5-1-2020
22-1-20202-1-2020
33-1-20204-1-2020
44-1-20204-1-2020
55-1-20205-1-2020

 

  • It is required to have the Created and Modified columns available in the result table (step 9), otherwise you can't select them in incremental refresh setup. Meaning that any other tables like dimensions, referencing this data, also need to have these columns. A problem that I foresee is that in these dimension tables duplicates will be removed which renders these columns useless. Please tell me there is a solution for this? Or tell me this assumption correct:
    • When there are 5 tables in a Dataflow, and 1 table is the 'mother table' (the other 4 tables reference this table) you only need to setup incremental refresh for the mother table and incremental refresh will do it's magic on the 'children tables', meaning that the children tables do not need to have the Created and Modified columns.
  • Incremental refresh for Dataflows looks at the Created and Modified columns in the result table after step 9. It does not impact at step 2, other than removing the RangeStart and RangeEnd filters.
  • For the Created column:
    • The part 'from the past' in 'Store rows from the past' and 'Refresh rows from the past' means it is relative to the last date in the Created column and not relative to the actual date the Dataflow refresh happens.
    • Setting 'Store rows from the past' to 4 days, only Indexes 2, 3, 4 and 5 will be in the result table from the Dataflow. Index 1 will be removed.
    • Setting 'Refresh rows from the past' to 2 days, only Indexes 4 and 5 will be refreshed. Meaning that Index 3 will not be updated. Conclusion: setting 'Refresh rows from the past' to anything less than 'Store rows from the past' is risky if you don't know how far changes will go back. Meaning that Index 2, 3, 4 and 5 will be in the result table and only Indexes 4 and 5 are refreshed.
  • For the Modified Column:
    • Setting 'Only refresh data if the maximum value in this field changes' means that nothing will be refreshed because the maximum value of this column is evaluated for each of the periods in the incremental range. I assume a period in the incremental refresh is equal to the setting 'Refresh rows from the past' and is not equal to the setting 'Store rows from the past'. Meaning that Index 3, even though it has changed in the last 2 days, will not be refreshed and thus have outdated data. Same conclusion as before: setting 'Refresh rows from the past' to anything less than 'Store rows from the past' is risky if you don't know how far changes will go back.
  • With these settings Index 1 will never be included even when it changes.

Anyone?

mwegener
Most Valuable Professional
Most Valuable Professional

Hi @richard-powerbi ,

 

I've been very busy this week.

Unfortunately I have no experience with the incremental refresh of data flows.

Here there seems to be differences between the incremental refresh of dataflows and data sets.

 

However, the concept is only suitable for fact tables that do not change after a certain point in time.

 

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast - Power BI Tutorials


Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors