Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Thorns
Regular Visitor

Dataflow Incremental Refresh Causing Duplicate Records

I created a Gen2 Dataflow the sources its data from a Salesforce object and loaded to my warehouse.  I then setup incremental refresh to review everything created in the past 3 years with a bucket size of Month and using the last modified date to determine what data to extract.

Thorns_0-1761080043373.png

 

I noticed a problem today where I had duplicate records in my table (only showing limited data for anonymity).  There were two records with the same Id, same CreatedDate, and LastModifiedDate.  For some reason the incremental refresh is causing a duplicate record to be created.  And this has happened with hundreds of records.

Thorns_1-1761080233917.png

What is happening with the incremental refresh that is causing duplicate records to be created in the table?

2 ACCEPTED SOLUTIONS
v-ssriganesh
Community Support
Community Support

Hello @Thorns,
Thank you for reaching out to the Microsoft fabric community forum.

The duplicate records in your warehouse table are likely due to the incremental refresh feature in Dataflow Gen2 not supporting query folding for Salesforce objects. This causes the full dataset to be pulled per bucket refresh, leading to improper replacement and duplicates, especially for modified records.

To fix this:

  1. Disable incremental refresh to stop further duplicates.
  2. Manually implement incremental load:
  • Add a reference query to get the max LastModifiedDate from your warehouse table.
  • Filter Salesforce data where LastModifiedDate > that value in your main query.
  • Use "upsert" (with Id as primary key) or "append" as the destination update method.
  1. If handling deletes, include the IsDeleted field and add logic to remove corresponding rows and Test with a manual refresh, then schedule.

For existing duplicates, run a one-time deduplication before applying the new setup. This should resolve the issue effectively.


Best regards,
Ganesh Singamshetty.

View solution in original post

Hello @Thorns,

Thanks for the update. glad you're pausing incremental refresh to avoid more duplicates.

I apologize for the confusion in my earlier suggestion upon double-checking the Dataflow Gen2 documentation, "upsert" isn't directly available as an update method for any destination, including Warehouse. For Warehouse specifically, the only supported option is replace, with no append or upsert in the UI.

To achieve an incremental upsert-like behavior (adding new records and updating changed ones without duplicates):

  • Switch your destination to a Fabric Lakehouse table (which supports both append and replace). This allows more flexibility

View solution in original post

5 REPLIES 5
v-ssriganesh
Community Support
Community Support

Hello @Thorns,

Hope everything’s going great with you. Just checking in has the issue been resolved or are you still running into problems? Sharing an update can really help others facing the same thing.

Thank you.

 

v-ssriganesh
Community Support
Community Support

Hello @Thorns,

We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.

Thank you.

v-ssriganesh
Community Support
Community Support

Hello @Thorns,
Thank you for reaching out to the Microsoft fabric community forum.

The duplicate records in your warehouse table are likely due to the incremental refresh feature in Dataflow Gen2 not supporting query folding for Salesforce objects. This causes the full dataset to be pulled per bucket refresh, leading to improper replacement and duplicates, especially for modified records.

To fix this:

  1. Disable incremental refresh to stop further duplicates.
  2. Manually implement incremental load:
  • Add a reference query to get the max LastModifiedDate from your warehouse table.
  • Filter Salesforce data where LastModifiedDate > that value in your main query.
  • Use "upsert" (with Id as primary key) or "append" as the destination update method.
  1. If handling deletes, include the IsDeleted field and add logic to remove corresponding rows and Test with a manual refresh, then schedule.

For existing duplicates, run a one-time deduplication before applying the new setup. This should resolve the issue effectively.


Best regards,
Ganesh Singamshetty.

Thanks for the information!  I will stop using incremental refresh.

 

Where in the Gen2 Dataflow does it allow for upsert?  I do not see that as an option.

Hello @Thorns,

Thanks for the update. glad you're pausing incremental refresh to avoid more duplicates.

I apologize for the confusion in my earlier suggestion upon double-checking the Dataflow Gen2 documentation, "upsert" isn't directly available as an update method for any destination, including Warehouse. For Warehouse specifically, the only supported option is replace, with no append or upsert in the UI.

To achieve an incremental upsert-like behavior (adding new records and updating changed ones without duplicates):

  • Switch your destination to a Fabric Lakehouse table (which supports both append and replace). This allows more flexibility

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.