Solved: Incremental Refresh on table that contains joins a...

adamlob · ‎09-17-2025

What is the most effecient way to do Incremental Refreshes on tables that contain joins and logic? Data is too big to truncate and reload each time.

When I try set it up it errors out.

Any workarounds that only use Gen 2 Dataflows? I don't want to use SQL or Notebook.

tayloramy · ‎09-17-2025

Hi @adamlob,

If you have the ability to detect deleted recoreds from source, then my approach would remain the same.

Incrementally load each table into a lakehouse. For deleted records, instead of deleting them outright from the lakehouse, mark a boolean column as a flag indicating that the record was deleted.

Be sure to include a watermark column to identify which records were modified by this ETL run.

Next, do all your joins and filter only to records that were modified this ETL run.

Lastly you can insert this these new records nto the final table, and delete records from the final table if the delete flag is set.

This is a pretty high level approach, for mroe details I would need more details about your data.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

tayloramy · ‎09-17-2025

Hi @adamlob,

If you have the ability to detect deleted recoreds from source, then my approach would remain the same.

Incrementally load each table into a lakehouse. For deleted records, instead of deleting them outright from the lakehouse, mark a boolean column as a flag indicating that the record was deleted.

Be sure to include a watermark column to identify which records were modified by this ETL run.

Next, do all your joins and filter only to records that were modified this ETL run.

Lastly you can insert this these new records nto the final table, and delete records from the final table if the delete flag is set.

This is a pretty high level approach, for mroe details I would need more details about your data.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

adamlob · ‎09-17-2025

Hi @tayloramy

This solution involves deleting records we don't need.

eg. we have a staging table of customers that are active

Then only import those child tables which contain the child IDs.

So over time new customers come in which we want and then some become inactive and we don't want them in our workspace anymore.

tayloramy · ‎09-17-2025

Hi @adamlob,

This very much depends on your data.

Do all of your source tables have a datetime column as a last modified date so they can be easily loaded incrementally independently?

I'd recommend loading all the source tables incrementally into Fabric first, and then doing the join in Fabric.

As you load data into Fabric, you can add a new column that indicates a loadtime. Then ocne the data is all in fabric, you can use that loadtime to identify new records after the join and then using some sort of business key, identify which of the new records are already in your target tables.

From there you can update the existing records, and insert new records.

This sort of approach doesn't handle data being deleted from the source at all, for that you would need to reload the data.

Here's Microsoft's docs on incremental refreshes with dataflows: https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-incremental-refresh

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.