The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
I have created a dataflow to pull data from a salesforce object into my warehouse but when I do a refresh, it appends to the existing warehouse table. I have added an index column but this did not solve the issue.
How can I refresh the data and not have duplicates?
Solved! Go to Solution.
Hi @tomperro
Thanks for reaching out to Fabric Community.
Your dataflow is currently appending new rows each time, which is why you end up with duplicates.
Here are Best approaches pick the one that best fits your scenario:
In your Dataflow destination settings (Gen2), disable the automatic settings and under Update methods choose Replace. This will truncate and reload the target table on every refresh, eliminating duplicates completely. Reference link: https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-data-destinations-and-managed-se...
Choose the method that aligns with your performance and audit requirements.
If this answer solves your issue, please give us a Kudos and mark it as Accepted Solution.
Kind regards,
Community Support Team _ C Srikanth.
Hi @tomperro
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!
Best Regards,
Community Support Team _ C Srikanth.
Hi @tomperro
Thanks for reaching out to Fabric Community.
Your dataflow is currently appending new rows each time, which is why you end up with duplicates.
Here are Best approaches pick the one that best fits your scenario:
In your Dataflow destination settings (Gen2), disable the automatic settings and under Update methods choose Replace. This will truncate and reload the target table on every refresh, eliminating duplicates completely. Reference link: https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-data-destinations-and-managed-se...
Choose the method that aligns with your performance and audit requirements.
If this answer solves your issue, please give us a Kudos and mark it as Accepted Solution.
Kind regards,
Community Support Team _ C Srikanth.
Hi @tomperro ,
It sounds like the issue you're encountering is due to the dataflow performing an append operation rather than a full refresh or upsert into your warehouse table. Adding an index column alone won't prevent duplicates unless you're using it as part of a deduplication step. To avoid duplicates, you’ll need to implement logic in your dataflow that either deletes existing data before each refresh or identifies and removes duplicates based on a unique key, such as a Salesforce record ID.
Another approach is to stage the incoming data in a temporary table and then use a transformation (like a merge or deduplicate step) before writing to the final destination. If you're using Microsoft Fabric or a similar platform with dataflows, you might also consider enabling incremental refresh or configuring the destination settings to overwrite the table on refresh, if that option is available.
Ok, those make sense, but how do I do that 😊
User | Count |
---|---|
40 | |
14 | |
14 | |
13 | |
9 |
User | Count |
---|---|
51 | |
43 | |
23 | |
20 | |
18 |