Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
AnnOminous
Microsoft Employee
Microsoft Employee

Dataflow Gen2 with incremental refresh won't allow appending data in SQL DB or Lakehouse

I've configured a set of API calls in a Dataflow Gen2 (not CI/CD enabled) to use incremental refresh. The incremental refresh is working for the dataflow, but I need to pass it through to a data store. When I try to set this up in Fabric, using either a SQL DB or a Lakehouse, I don't have the option to append new data. The Data Destination settings are pre-assigned to replace data, and I can't change it.

Is there a way for me to create a dataflow that refreshes incrementally, and then push the refresh data to a Fabric data store using append?

Or am I misunderstanding how the "replace" functionality works? Does it mean that the DFG2 will only extract the incremental data, but the push to the data store would mean all the data in the DFG2 tables would be written as the replacement data?

 

Happy to provide any additional information that will help answer my question.

1 ACCEPTED SOLUTION
Vinodh247
Resolver IV
Resolver IV

Tldr: 

 

Dataflow Gen2 supports incremental data extraction, but does not support appending data to destinations like Lakehouse or SQL DB, the destination always replaces the output table.

 

Clarification

  • The "replace" functionality in incremental refresh means the dataflow outputs only the incr data, but the destination table or partition is replaced with that data during the refresh.

  • You cannot directly configure Dataflow Gen2 incremental refresh to append new data to SQLDB or Lakehouse destinations in Fabric; the system enforces "replace" mode for these supported destinations.

  • To achieve an append like behavior, you must implement a custom pattern involving staging incremental data and then merging/appending it into the destination via additional steps outside the dataflow incremental refresh itself.

Couple of workaround suggestions:

Since this is currently a platform limitation, here are some alternatives:

1. Use Dataflow Gen2 to Stage the Data

  • Create an output table in the Lakehouse with the incremental extract (even though it replaces each time).

  • Then run a Spark Notebook or Data Pipeline to append this staged data into a final table.

This gives you control:

  • Staging table (api_incremental_temp) gets replaced.

  • Final table (api_cumulative) gets updated via an INSERT INTO SELECT or merge logic.

2. Move to Pipelines with Notebooks or Spark Jobs

  • If your use case is complex (like deduplication, watermark tracking, etc.), use Fabric pipelines with notebooks that implement full delta logic:

    • Call API directly from notebook.

    • Use checkpointing/watermark logic in Spark.

    • Append to Lakehouse as needed.

Recommendation

Since you are already using incremental refresh in DFG2, the best path today is:

  1. Use DFG2 to pull incremental data and write it to a staging Lakehouse table.

  2. Run a Fabric Notebook or a Pipeline activity to:

    • Read from the staging table.

    • Append/merge into the final destination table.

You get:

  • API delta load benefits.

  • Historical accumulation in your Lakehouse.

 

Please Kudos and 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

View solution in original post

5 REPLIES 5
V-yubandi-msft
Community Support
Community Support

Hi @AnnOminous ,

Thank you for your question. @Vinodh247 , has already provided a helpful explanation. To add a bit more Dataflow Gen2 is capable of pulling data incrementally, but it doesn't currently allow that data to be directly appended to a Lakehouse or SQL DB. Instead, it replaces the portion of the table that matches the incremental range with the newly retrieved data during each refresh.

FYI:

Vyubandimsft_0-1752477543334.png

If you want to keep historical data, a typical method is to first load the new data into a temporary  table using Dataflow Gen2. Then, you can use a Spark Notebook or Data Pipeline to add this data to your main table rather than replacing the current data. 

 

For more details, you can refer to Microsoft’s official documentation
Incremental refresh in Dataflow Gen2 - Microsoft Fabric | Microsoft Learn

 

 

Thanks & Regards,

Community Support Team.

lbendlin
Super User
Super User

This might be a good time to question the business value of the dataflow. Could you implement your process without it? 

@lbendlin, thanks for the response. The answer is, possibly, but as I noted in my response to Vinohd247 below, I'm still pretty new to the Fabric environment, and have only really ever worked in SQL and PBI up til now, so if there are alternative options for me, I don't know what they are yet.

 

Regardless, thanks again - it definitely gives me something to research in the Fabric environment.

Vinodh247
Resolver IV
Resolver IV

Tldr: 

 

Dataflow Gen2 supports incremental data extraction, but does not support appending data to destinations like Lakehouse or SQL DB, the destination always replaces the output table.

 

Clarification

  • The "replace" functionality in incremental refresh means the dataflow outputs only the incr data, but the destination table or partition is replaced with that data during the refresh.

  • You cannot directly configure Dataflow Gen2 incremental refresh to append new data to SQLDB or Lakehouse destinations in Fabric; the system enforces "replace" mode for these supported destinations.

  • To achieve an append like behavior, you must implement a custom pattern involving staging incremental data and then merging/appending it into the destination via additional steps outside the dataflow incremental refresh itself.

Couple of workaround suggestions:

Since this is currently a platform limitation, here are some alternatives:

1. Use Dataflow Gen2 to Stage the Data

  • Create an output table in the Lakehouse with the incremental extract (even though it replaces each time).

  • Then run a Spark Notebook or Data Pipeline to append this staged data into a final table.

This gives you control:

  • Staging table (api_incremental_temp) gets replaced.

  • Final table (api_cumulative) gets updated via an INSERT INTO SELECT or merge logic.

2. Move to Pipelines with Notebooks or Spark Jobs

  • If your use case is complex (like deduplication, watermark tracking, etc.), use Fabric pipelines with notebooks that implement full delta logic:

    • Call API directly from notebook.

    • Use checkpointing/watermark logic in Spark.

    • Append to Lakehouse as needed.

Recommendation

Since you are already using incremental refresh in DFG2, the best path today is:

  1. Use DFG2 to pull incremental data and write it to a staging Lakehouse table.

  2. Run a Fabric Notebook or a Pipeline activity to:

    • Read from the staging table.

    • Append/merge into the final destination table.

You get:

  • API delta load benefits.

  • Historical accumulation in your Lakehouse.

 

Please Kudos and 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

@Vinodh247, thank you for this response. I'm marking this as the solution, even if I'm not able to implement it. I'm still pretty new to the Fabric environment, and have only really ever worked in SQL and PBI up til now. That being said, your response makes a lot of sense, and I'll start digging into seeing if I can get it to work.

Thanks again! 

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.