Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
hansei
Helper V
Helper V

What is wrong with my incremental refresh?

I have a query that reads and processes excel files on SharePoint.  (It gets file using SharePoint.Contents.)  I am trying to implement incremental refresh and, at the same time, not read every file on the first refresh.

 

RangeStart & RangeEnd & Filter were auto-created by service during configuration. The resulting query looks something like this:

let
    Source = SharePoint.Contents( "https://myco.sharepoint.com/teams/mysite/", [ApiVersion = 15] ){[ Name = "mylib" ]}[Content],
    #"Actual work" = do stuff here,
    #"Header Info-44617465206D6F646966696564-autogenerated_for_incremental_refresh" = Table.SelectRows(#"Actual work", each DateTime.From([#"Date modified"]) >= RangeStart and DateTime.From([#"Date modified"]) < RangeEnd)
in
    #"Header Info-44617465206D6F646966696564-autogenerated_for_incremental_refresh"

 

After the dataflow runs, I am inspecting it in Power BI Desktop.


I have run with the filter prior to the "Actual work" so that it is folded, but the following anomaly happens whether filter is near the beginning or at the end of query:
run1: set "Store rows from the past" to 12 months - 462K rows returned from 415 files
run2: set "Store rows from the past" to 18 months - 378K rows returned from 343 files
It should go without saying that past 18 months encompasses more files and rows than past 12.


1 ACCEPTED SOLUTION

Thb, this incremental load is not going to work here in a proper (designed) way. In the past is was not possible to have one with this data source. This might have changed now?
So if the amount of data from the Sharepoint is too big maybe this might be not the best way to use Sharepoint as a "Database". You are still reading files...this is something completely different like reading from a db. Using Sharepoint like this is often a pain.

If you can try another (more robust) approach.

Another idea...
Maybe you can split your data in different dataflows - each with 6 months of data or so. Maybe you don't need data older than 6 months?
Merge the results afterwards.
Hope you got the idea.

Regards

View solution in original post

9 REPLIES 9
v-aatheeque
Community Support
Community Support

Hi @hansei 

Have you had a chance to look through the responses shared earlier? If anything is still unclear, we’ll be happy to provide additional support.

Hi @hansei 

Following up to confirm if the earlier responses addressed your query. If not, please share your questions and we’ll assist further.

Ahmed-Elfeel
Solution Supplier
Solution Supplier

Hi @hansei,

This issue happen when you apply the date filter after Actual work , Power Query still processes all files from SharePoint during the Actual work step then filters the results afterward☺️❤️

 

This means:

  • All 415+ files are being read and processed every time

  • The filter only reduces the output (not the workload)

  • You are paying the full performance cost regardless of your date range

So what to do Now to solve this?

You need to push the filter upstream to the SharePoint.Contents level 

let
    Source = SharePoint.Contents("https://myco.sharepoint.com/teams/mysite/", [ApiVersion = 15]),
    mylib = Source{[Name="mylib"]}[Content],
    
    // Apply date filter IMMEDIATELY after getting the file list
    FilteredFiles = Table.SelectRows(mylib, each DateTime.From([#"Date modified"]) >= RangeStart and DateTime.From([#"Date modified"]) < RangeEnd),
    
    // Then do your actual work on only the filtered files
    #"Actual work" = ... process FilteredFiles here ...
in
    #"Actual work"

 

So to make it success you should:

  • Apply date filters as early as possible in your query
  • Ensure the filter translates to native API calls (SharePoint filtering)
  • In Power BI Desktop check View Native Query to see if folding is occurring

So if you want to test it First before go deep (Temporary Testing)

  • Add a counter to see how many files are being processed:
FilteredFiles = Table.SelectRows(mylib, each DateTime.From([#"Date modified"]) >= RangeStart and DateTime.From([#"Date modified"]) < RangeEnd),
FileCount = Table.RowCount(FilteredFiles),

 This should show significantly fewer files when using smaller date ranges

  • A properly config incremental refresh should get faster as the date range narrows
if this post helps, then I would appreciate a thumbs up and mark it as the solution to help the other members find it more quickly.

My question is not about performance or optimization. My question is why the amount of data stored for a larger date range is smaller than the amount of data stored for a smaller date range. i.e. why is rowcount reduced?

Hi @hansei,

Sorry😅❤️

so this issue occurred cause :

  • Time zone mismatch (RangeStart/RangeEnd (UTC)) vs your file dates (local time)
  • Data changing between refreshes (files being moved/deleted/modified)
  • Filter logic error especially with date comparisons and DateTime.From() conversions

So you add this diagnostic step right before your filter for checking:

// Check what is actually happening
Diagnostic = Table.AddColumn(#"Actual work", "DateCheck", each [
    RawDate = [#"Date modified"],
    ConvertedDate = DateTime.From([#"Date modified"]), 
    InRange = DateTime.From([#"Date modified"]) >= RangeStart and DateTime.From([#"Date modified"]) < RangeEnd
])

This will show if dates are converting/compairing correctly.

 

Resources :

I hope this was useful ☺️❤️

The only change is incremental refresh setting "Store rows from the past".

hansei_0-1762805971541.png

 

 

Date conversion, time zones, etc. are all consistent between the 2 runs. i.e. regardless of any conversions, it is the same conversion on the 2 runs.

 

It is not possible for an item to have a modification datetime that is in the past 12 months, but not in the last 18 months.

Nor is it possible for an item's modification date to be < 12 months on run 1, but then be > 18 months on run 2.

Hi @hansei 


Based on the screenshot that the only change made is in the Store rows from the past setting (from 12 months to 18 months), while everything else including the DateTime column and conversions remains the same.

However, when you modify the incremental refresh window in Power BI, the underlying partitions that store data aren’t automatically rebuilt. As a result, the service may continue using cached partitions from the earlier refresh policy (in your case, the 12-month window).

 

This can lead to unexpected results, such as missing or inconsistent rows between the two runs, even though the data itself hasn’t changed.

To ensure the new 18-month policy takes full effect:

  • In Power BI Desktop, turn Incremental refresh Off, apply changes, and then turn it On again with the updated 18-month window.
  • Re-publish the dataset to the service.
  • In the Power BI service, run a Full refresh once, this forces Power BI to recreate all partitions according to the new refresh policy.

After that, incremental refresh will continue normally and include the full 18-month range as expected.

 

Reference : Using incremental refresh with dataflows - Power Query | Microsoft Learn

 

Hope this helps !!

Thank You.

I am not actually using desktop to set incremental refresh. As this forum name and my description indicate, I am setting it in the Power BI service on a dataflow.

 

Regardless, I cannot "run a Full refresh once". Let me explain further what I trying to accomplish, not just the problem I am running into:

  • there is too much data for a full refresh to take place. It will time out in the Power BI service before reading all files.
  • I plan to use incremental refresh ongoing, so that it only needs to read changes.
  • However, right now, I need to use incremental refresh to read the data in chunks. e.g. Read first 6 months, then next 6 months, then next 6 months. (Or the reverse: read last 6 months, then prior 6 months, then prior 6 months.)

However, the approach I am using had led to the anomalous results explained.

Thb, this incremental load is not going to work here in a proper (designed) way. In the past is was not possible to have one with this data source. This might have changed now?
So if the amount of data from the Sharepoint is too big maybe this might be not the best way to use Sharepoint as a "Database". You are still reading files...this is something completely different like reading from a db. Using Sharepoint like this is often a pain.

If you can try another (more robust) approach.

Another idea...
Maybe you can split your data in different dataflows - each with 6 months of data or so. Maybe you don't need data older than 6 months?
Merge the results afterwards.
Hope you got the idea.

Regards

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors