Solved: Re: Confusing Documentation For DataFlow Gen2 Incr...

kgardner3300 · ‎05-18-2025

The documentation for Dataflow Gen2 Incremental Refresh is a bit confusing. Taking the example, with the options specified here:

My understanding is the following steps:

Step 1: Fabric will first check all of the rows in the table to see if the maximum value of the "ModifiedDate" column has increased. If the maximum value has increased in this column (for 1 or more rows), then continue. Else, fabric will not continue on to the next steps and won't change anything.

Step 2: Fabric will split the rows in buckets based on the "OrderDate". If their is actual data on each day for the past 2 weeks in this OrderDate column, then their will be 14 bucket since the bucket size is 1 day and their are 14 days in 2 weeks. Fabric will then process these 14 buckets in parallel.

Step 3: Fabric will replace each of the rows in the destination table having an OrderDate column value in the past 2 weeks, with the rows from all of the buckets in the previous step. If their is a row that has an OrderDate column value that is older than 2 weeks ago, Fabric will not replace these rows.

Please let me know if this understanding is correct, and if not please explain which part is incorrect.

Thanks!

v-csrikanth · ‎06-01-2025

Hi @kgardner3300

Your high-level steps are mostly correct, but here are a few clarifications to ensure there’s no confusion:

Step 1: Detecting Changes via ModifiedDate

When you check “Only extract new data when the maximum value in this column changes” and point it at ModifiedDate, Dataflow Gen2 will query the full source table to see if any row’s ModifiedDate is greater than the last time it ran.
If nothing has changed, the incremental refresh stops immediately, and no partitions are touched.
If at least one row’s ModifiedDate has increased, Fabric proceeds to re-evaluate the defined partition range based on OrderDate.

Step 2: Partitioning by OrderDate

Once a change is detected, Fabric builds partitions (buckets) by splitting the OrderDate column into daily slices (because you chose “2 Weeks” / “Bucket size: Day”).
In your example, that creates 14 separate partitions—one for each day in the last two weeks (Day – 0 through Day – 13).
Each partition is processed in parallel, so Fabric can read and transform just those 14 days instead of scanning the entire table.

Step 3: Replacing Only the 2-Week Window in the Destination

For each of those 14 daily partitions, Fabric issues a “write” (upsert/replace) against the destination entity, deleting and re-inserting all rows whose OrderDate falls within that one-day slice.
Rows whose OrderDate is older than two weeks are left untouched—Fabric never rewrites those partitions.
In other words, only rows in the “past two weeks” window (as of today) are re-loaded; everything older remains as‐is in the destination.

If the above information helps you, please give us a Kudos and marked the Accept as a solution.

Best Regards,
Community Support Team _ C Srikanth.

View solution in original post

v-csrikanth · ‎06-13-2025

Hi @kgardner3300

We haven't heard from you since last response and just wanted to check whether the solution provided has worked for you. If yes, please Accept as Solution to help others benefit in the community.
Thank you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-10-2025

Hi @kgardner3300

I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!

Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-07-2025

Hi @kgardner3300

It's been a while since I heard back from you and I wanted to follow up. Have you had a chance to try the solutions that have been offered?
If the issue has been resolved, can you mark the post as resolved? If you're still experiencing challenges, please feel free to let us know and we'll be happy to continue to help!
Looking forward to your reply!

Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-01-2025

Hi @kgardner3300

Your high-level steps are mostly correct, but here are a few clarifications to ensure there’s no confusion:

Step 1: Detecting Changes via ModifiedDate

When you check “Only extract new data when the maximum value in this column changes” and point it at ModifiedDate, Dataflow Gen2 will query the full source table to see if any row’s ModifiedDate is greater than the last time it ran.
If nothing has changed, the incremental refresh stops immediately, and no partitions are touched.
If at least one row’s ModifiedDate has increased, Fabric proceeds to re-evaluate the defined partition range based on OrderDate.

Step 2: Partitioning by OrderDate

Once a change is detected, Fabric builds partitions (buckets) by splitting the OrderDate column into daily slices (because you chose “2 Weeks” / “Bucket size: Day”).
In your example, that creates 14 separate partitions—one for each day in the last two weeks (Day – 0 through Day – 13).
Each partition is processed in parallel, so Fabric can read and transform just those 14 days instead of scanning the entire table.

Step 3: Replacing Only the 2-Week Window in the Destination

For each of those 14 daily partitions, Fabric issues a “write” (upsert/replace) against the destination entity, deleting and re-inserting all rows whose OrderDate falls within that one-day slice.
Rows whose OrderDate is older than two weeks are left untouched—Fabric never rewrites those partitions.
In other words, only rows in the “past two weeks” window (as of today) are re-loaded; everything older remains as‐is in the destination.

If the above information helps you, please give us a Kudos and marked the Accept as a solution.

Best Regards,
Community Support Team _ C Srikanth.

lutz_bendlin · ‎05-19-2025

Step1: Fabric will run a query that for each order date in the last 14 days to find if there was at least one row where the "Last Modified Date" is higher than the cached value, or where there is no cached value.

Step 2: For those Order Dates Fabric will retrieve all rows, and will replace (flush and fill) the data in the affected partitions/buckets/files (pick your own terminology...)

Confusing Documentation For DataFlow Gen2 Incremental Refresh

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Confusing Documentation For DataFlow Gen2 Incremental Refresh

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026