Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
We have pipelines that have been running for months without issue. Overnight some copy data activities from lakehouse to lakehouse are now appending rather than overwriting data. The copy data activies haven't been edited, they are still set to overwrite however data is being appended.
Has anyone else experienced this?
I have retested this just now and overwrite is working as you'd expect, they have fixed this
Yes, I discovered this today. Can confirm that changing the JSON to Overwrite from OverwriteSchema works.
For me, it applies only to lakehouse-to-lakehouse Copy Data activities.
I have lots of Copy Data pipelines connected to on-premise DB via a gateway and I've just tested one and the overwriting action works correctly despite the JSON code being "OverwriteSchema".
However, I've just set up a new Copy Data to copy a table from one lakehouse to another, i.e. totally internal to Fabric, and this appends rather than overwrites as per OP's post. I change "OverwriteSchema" to "Overwrite" in the JSON and it works correctly again.
It's like there's a disconnect between the UI and the JSON. This is a dreadful bug.
I heard the same. Overwrite has been changed to Overwrite schema.
Also any of you facing this issue where in once you load the data, the sql endpoint still shows older version of data?
This is frustrating.
Ref. old data in SQL Analytics Endpoint:
We have also noted some odd behaviour with the Copy Data acivity.
It appears to have started from the 15th August for us. Where data is not being overwritten even though the overwrite option is set.
We managed to solve this issue with help from Mictosoft Support. The overwrite option when selected in a copy data activity has been updated to carry out OverwriteSchema.
"OverwriteSchema" action does not delete any data. Instead, it creates a new version of the Delta table with the updated schema, allowing users to leverage the time travel feature to view all previous versions of the table.
To perform an Overwrite, where the current data is completely replaced with the newly copied data, select 'Add dynamic content' for the Table Action and enter 'Overwrite'.
After spending 2 hours trying to understand why my ID has duplicates, I found out that records are duplicated in my LAKEHOUSE copy pipeline which I used the copy assitant to generate. Then I tried to use both append and overwrite to see if that fixes the issues, and it didn't.
Thank you for providing this workaround for what is clearly a bug! I wonder how anyone would be able to know this!
I have tried it, and I am not sure why, but after adding overwrite to dynamic content, it threw an error.
I ended up creating a notebook to purge tables. Probably that is the way that it was intended to happen (still the question on why Overwrite option exits in data pipeline copy process.
During the MS Fabric Conference (It was great, and if you didn't join one of these yet, please make sure you do!), I tested this with a MS Data Factory person, and the overwrite worked well! It could be something temporary that happened in August, and now in September it worked fine as expected, overwriting and not appending.
This message prompted me to retest this quickly, and it does appear that the bug has been fixed.
Thanks for the update. Do you know if there is any documentation on the change? I've been looking here https://learn.microsoft.com/en-us/fabric/data-factory/copy-data-activity but couldn't find anything.
Hi,
Same issue this side, which is clearly that overwrite now appends instead of its intended functionality. Not a user-side issue.
Must say, working with Fabric has been challenging at times...
Hi alynch,
Same issue here. Our overwrite pipelines suddenly startet to append. It startet round-about last Thursday.
Hi @alynch ,
As I understand your query, the issue you are facing is that the data is getting overwritten after every iteration of copy activity inside foreach. Please let me know if my understanding is incorrect.
If you would like to write all data to a single file from multiple sources then you will have to copy the files for each iteration to an intermediate folder by appending datetime to the filename ( using parameterized dataset at the sink) and then once all the iterations are completed then have another subsequent copy activity outside of your ForEach activity and point it's source to the intermediate folder and in sink point to your desired file store and then use copyBehavior as MergeFiles in copy activity settings.
You can learn more about it in this document: Copy data from/to a file system - Azure Data Factory & Azure Synapse | Microsoft Learn
Best Regards
Yilong Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
I think the issue is, that the data is not getting overwriting where it should, but appending instead.
Check out the October 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
9 | |
7 | |
6 | |
4 | |
2 |
User | Count |
---|---|
19 | |
15 | |
15 | |
6 | |
6 |