Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
tomperro
Helper V
Helper V

Duplicate rows in dataflow

I have created a dataflow to pull data from a salesforce object into my warehouse but when I do a refresh, it appends to the existing warehouse table. I have added an index column but this did not solve the issue.

How can I refresh the data and not have duplicates?

1 ACCEPTED SOLUTION
v-csrikanth
Community Support
Community Support

Hi @tomperro 
Thanks for reaching out to Fabric Community.

Your dataflow is currently appending new rows each time, which is why you end up with duplicates.

Here are Best approaches pick the one that best fits your scenario:

Choose the method that aligns with your performance and audit requirements.


If this answer solves your issue, please give us a Kudos and mark it as Accepted Solution.
Kind regards,
Community Support Team _ C Srikanth.

View solution in original post

5 REPLIES 5
v-csrikanth
Community Support
Community Support

Hi @tomperro 
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!


Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth
Community Support
Community Support

Hi @tomperro 
Thanks for reaching out to Fabric Community.

Your dataflow is currently appending new rows each time, which is why you end up with duplicates.

Here are Best approaches pick the one that best fits your scenario:

Choose the method that aligns with your performance and audit requirements.


If this answer solves your issue, please give us a Kudos and mark it as Accepted Solution.
Kind regards,
Community Support Team _ C Srikanth.

rohit1991
Super User
Super User

Hi @tomperro ,
It sounds like the issue you're encountering is due to the dataflow performing an append operation rather than a full refresh or upsert into your warehouse table. Adding an index column alone won't prevent duplicates unless you're using it as part of a deduplication step. To avoid duplicates, you’ll need to implement logic in your dataflow that either deletes existing data before each refresh or identifies and removes duplicates based on a unique key, such as a Salesforce record ID.

 

Another approach is to stage the incoming data in a temporary table and then use a transformation (like a merge or deduplicate step) before writing to the final destination. If you're using Microsoft Fabric or a similar platform with dataflows, you might also consider enabling incremental refresh or configuring the destination settings to overwrite the table on refresh, if that option is available.


Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

Ok, those make sense, but how do I do that  😊

Hi @tomp This usually happens when the refresh is running in append mode, so every refresh just inserts the same records again. Adding an index or incremental column by itself won’t prevent duplicates if the destination table isn’t using a merge logic.

The usual way to fix this is:

  • Define a real unique key (for example the Salesforce Id).

  • Load the data using an UPSERT / MERGE strategy:

    • If the record already exists → update it

    • If it doesn’t exist → insert it

How you implement this depends a lot on the warehouse you’re loading into (BigQuery, Snowflake, Postgres, SQL Server, etc.), since each one handles MERGE a bit differently.

If you’d rather not manage this manually, some data sync tools already handle it. For example, with Windsor.ai, when syncing Salesforce to SQL-based warehouses, you can define “columns to match” and the refresh runs as an UPSERT, so records are updated instead of duplicated.

The key takeaway is that as long as the process is pure append, duplicates are expected. You’ll need a proper unique key and a MERGE/UPSERT-based refresh to avoid them

Hope this helps

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.