Solved: Dataflow Gen2 with incremental refresh won't allow...

AnnOminous · ‎07-11-2025

I've configured a set of API calls in a Dataflow Gen2 (not CI/CD enabled) to use incremental refresh. The incremental refresh is working for the dataflow, but I need to pass it through to a data store. When I try to set this up in Fabric, using either a SQL DB or a Lakehouse, I don't have the option to append new data. The Data Destination settings are pre-assigned to replace data, and I can't change it.

Is there a way for me to create a dataflow that refreshes incrementally, and then push the refresh data to a Fabric data store using append?

Or am I misunderstanding how the "replace" functionality works? Does it mean that the DFG2 will only extract the incremental data, but the push to the data store would mean all the data in the DFG2 tables would be written as the replacement data?

Happy to provide any additional information that will help answer my question.

Vinodh247 · ‎07-12-2025

Tldr:

Dataflow Gen2 supports incremental data extraction, but does not support appending data to destinations like Lakehouse or SQL DB, the destination always replaces the output table.

Clarification

The "replace" functionality in incremental refresh means the dataflow outputs only the incr data, but the destination table or partition is replaced with that data during the refresh.
You cannot directly configure Dataflow Gen2 incremental refresh to append new data to SQLDB or Lakehouse destinations in Fabric; the system enforces "replace" mode for these supported destinations.
To achieve an append like behavior, you must implement a custom pattern involving staging incremental data and then merging/appending it into the destination via additional steps outside the dataflow incremental refresh itself.

Couple of workaround suggestions:

Since this is currently a platform limitation, here are some alternatives:

1. Use Dataflow Gen2 to Stage the Data

Create an output table in the Lakehouse with the incremental extract (even though it replaces each time).
Then run a Spark Notebook or Data Pipeline to append this staged data into a final table.

This gives you control:

Staging table (api_incremental_temp) gets replaced.
Final table (api_cumulative) gets updated via an INSERT INTO SELECT or merge logic.

2. Move to Pipelines with Notebooks or Spark Jobs

If your use case is complex (like deduplication, watermark tracking, etc.), use Fabric pipelines with notebooks that implement full delta logic:
- Call API directly from notebook.
- Use checkpointing/watermark logic in Spark.
- Append to Lakehouse as needed.

Recommendation

Since you are already using incremental refresh in DFG2, the best path today is:

Use DFG2 to pull incremental data and write it to a staging Lakehouse table.
Run a Fabric Notebook or a Pipeline activity to:
- Read from the staging table.
- Append/merge into the final destination table.

You get:

API delta load benefits.
Historical accumulation in your Lakehouse.

Please Kudos and 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

View solution in original post

V-yubandi-msft · ‎07-14-2025

Hi @AnnOminous ,

Thank you for your question. @Vinodh247 , has already provided a helpful explanation. To add a bit more Dataflow Gen2 is capable of pulling data incrementally, but it doesn't currently allow that data to be directly appended to a Lakehouse or SQL DB. Instead, it replaces the portion of the table that matches the incremental range with the newly retrieved data during each refresh.

FYI:

If you want to keep historical data, a typical method is to first load the new data into a temporary table using Dataflow Gen2. Then, you can use a Spark Notebook or Data Pipeline to add this data to your main table rather than replacing the current data.

For more details, you can refer to Microsoft’s official documentation
Incremental refresh in Dataflow Gen2 - Microsoft Fabric | Microsoft Learn

Thanks & Regards,

Community Support Team.

lbendlin · ‎07-13-2025

This might be a good time to question the business value of the dataflow. Could you implement your process without it?

AnnOminous · ‎07-14-2025

@lbendlin, thanks for the response. The answer is, possibly, but as I noted in my response to Vinohd247 below, I'm still pretty new to the Fabric environment, and have only really ever worked in SQL and PBI up til now, so if there are alternative options for me, I don't know what they are yet.

Regardless, thanks again - it definitely gives me something to research in the Fabric environment.

Vinodh247 · ‎07-12-2025

Tldr:

Dataflow Gen2 supports incremental data extraction, but does not support appending data to destinations like Lakehouse or SQL DB, the destination always replaces the output table.

Clarification

The "replace" functionality in incremental refresh means the dataflow outputs only the incr data, but the destination table or partition is replaced with that data during the refresh.
You cannot directly configure Dataflow Gen2 incremental refresh to append new data to SQLDB or Lakehouse destinations in Fabric; the system enforces "replace" mode for these supported destinations.
To achieve an append like behavior, you must implement a custom pattern involving staging incremental data and then merging/appending it into the destination via additional steps outside the dataflow incremental refresh itself.

Couple of workaround suggestions:

Since this is currently a platform limitation, here are some alternatives:

1. Use Dataflow Gen2 to Stage the Data

Create an output table in the Lakehouse with the incremental extract (even though it replaces each time).
Then run a Spark Notebook or Data Pipeline to append this staged data into a final table.

This gives you control:

Staging table (api_incremental_temp) gets replaced.
Final table (api_cumulative) gets updated via an INSERT INTO SELECT or merge logic.

2. Move to Pipelines with Notebooks or Spark Jobs

If your use case is complex (like deduplication, watermark tracking, etc.), use Fabric pipelines with notebooks that implement full delta logic:
- Call API directly from notebook.
- Use checkpointing/watermark logic in Spark.
- Append to Lakehouse as needed.

Recommendation

Since you are already using incremental refresh in DFG2, the best path today is:

Use DFG2 to pull incremental data and write it to a staging Lakehouse table.
Run a Fabric Notebook or a Pipeline activity to:
- Read from the staging table.
- Append/merge into the final destination table.

You get:

API delta load benefits.
Historical accumulation in your Lakehouse.

Please Kudos and 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

Please 'Kudos' and 'Accept as Solution' if this answered your query.

Regards,
Vinodh
Microsoft MVP [Fabric]

AnnOminous · ‎07-14-2025

@Vinodh247, thank you for this response. I'm marking this as the solution, even if I'm not able to implement it. I'm still pretty new to the Fabric environment, and have only really ever worked in SQL and PBI up til now. That being said, your response makes a lot of sense, and I'll start digging into seeing if I can get it to work.

Thanks again!

Dataflow Gen2 with incremental refresh won't allow appending data in SQL DB or Lakehouse

1. Use Dataflow Gen2 to Stage the Data

2. Move to Pipelines with Notebooks or Spark Jobs

Recommendation

1. Use Dataflow Gen2 to Stage the Data

2. Move to Pipelines with Notebooks or Spark Jobs

Recommendation

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Dataflow Gen2 with incremental refresh won't allow appending data in SQL DB or Lakehouse

1. Use Dataflow Gen2 to Stage the Data

2. Move to Pipelines with Notebooks or Spark Jobs

Recommendation

1. Use Dataflow Gen2 to Stage the Data

2. Move to Pipelines with Notebooks or Spark Jobs

Recommendation

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025