Solved: QUESTION::PIPELINE::COPY DATA::DUPLICATE DATA COPY

Element115 · ‎04-04-2025

ISSUE:

As shown in the screenshot, one set of rows appears 8 times, the values being duplicated.

For each hour of the day, we should only have 4 rows.

This is not the first time this happens. Also, it happens at different times of day, so kinda random. And happens once or twice if I let the pipeline run for 1 week. This means the pipeline architecture does not contain any clue as to why this occurs. The algo and sequencing of activities is--well, I write 'is', but perhaps I should use the verb 'should' here as who knows what really goes on under the Fabric hood--deterministic. Nothing complicated, a straightforward incremental data copy algo based on ordered and monotonically increasing unique IDs.

QUESTION:

Did anyone else come across something similar? And if so, what did you determine the root cause was (if not a user error)? In other words, did/does anyone else have an open support ticket for a similar issue and what was the result?

I opened a ticket but am trying to accelerate the discovery of key info pertaining to this issue in case it's out there.

Element115 · ‎04-21-2025

@v-pnaroju-msft

Here is a workaround originally shared by Mark Pryce-Maher, the

Microsoft PM responsible for the synchronisation between lakehouse and SQL endpoint. As he says: "the following solution is unofficial, unsupported, undocumented, and frankly unwise!"

However, I would not call this a solution since it is unofficial, unsupported, undocumented, and unwise. Anything that is unofficial, unsupported, undocumented, is precisely unwise because it could change at any time without any notice to the user, who will be much bewildered when stuff suddenly breaks for no apparent reason.

Mark's original Python code: Workaround for delays in the automatically generated schema in the SQL analytics endpoint of the Lak...

An improved version of this code by Andre Fomin: Fix SQL Analytics Endpoint Sync Issues in Microsoft Fabric – Data Not Showing? Here's the Solution! ...

For now, this Python script seems to work and the problem I had due to the sync issue has not manifested itself... yet.

View solution in original post

jcantwell · ‎09-09-2025

Was there ever a satisfactory solution to this problem with Microsoft? I have the same scenario, except my entire data set is ocassionally writing 2x. I run a pre-script to delete all data from the destination, and then quite simply load everything from a lakehouse view into a warehouse table. Sometimes the loads are fine, sometimes they are completely doubled. (A CreatedOn field shows that for each duplicated record, there is usually a couple of seconds difference between the load time).

Element115 · ‎09-10-2025

Define 'satisfactory' because way back when I posted the workaround put forth by the Microsoft PM himself. This year, Microsoft finally released a lakehouse REST API in GA. Haven't used it yet. My guess is you drop a web activity in the pipeline and do a REST call to the lakehouse to force a sync, same as is done in the Python notebook.

v-pnaroju-msft · ‎04-26-2025

Hi Element115,

Thank you for your update.

We sincerely appreciate your efforts in sharing this workaround information with the community. To improve the visibility of this discussion within the forum, we kindly request you to mark your response as the accepted solution. This will help other members facing similar queries to easily locate the relevant information.

Thank you.

v-pnaroju-msft · ‎04-23-2025

Hi Element115,

Thankyou for the update.

As you have already raised a support ticket, we kindly request you to continue collaborating closely with the support team.

We would appreciate it if you could keep us informed about the resolution of the support ticket, as it may prove beneficial for the wider Fabric community.

Thank you.

Element115 · ‎04-23-2025

@v-pnaroju-msft

Nothing suggested by the support team proved useful. I had to find myself the write up and code posted by a Microsoft Product Manager to finally get a workaround that so far (after a couple days of non-stop testing) still seems to work. See my previous post.

v-pnaroju-msft · ‎04-15-2025

Hi Element115,

Thank you for the update.

If a solution is provided by the support team share it with the community. This will help others facing similar challenges and benefit the broader community.

Thank you.

Element115 · ‎04-21-2025

@v-pnaroju-msft

Here is a workaround originally shared by Mark Pryce-Maher, the

Microsoft PM responsible for the synchronisation between lakehouse and SQL endpoint. As he says: "the following solution is unofficial, unsupported, undocumented, and frankly unwise!"

However, I would not call this a solution since it is unofficial, unsupported, undocumented, and unwise. Anything that is unofficial, unsupported, undocumented, is precisely unwise because it could change at any time without any notice to the user, who will be much bewildered when stuff suddenly breaks for no apparent reason.

Mark's original Python code: Workaround for delays in the automatically generated schema in the SQL analytics endpoint of the Lak...

An improved version of this code by Andre Fomin: Fix SQL Analytics Endpoint Sync Issues in Microsoft Fabric – Data Not Showing? Here's the Solution! ...

For now, this Python script seems to work and the problem I had due to the sync issue has not manifested itself... yet.

v-pnaroju-msft · ‎04-15-2025

Hi Element115,

We are following up to check whether you have raised the support ticket. If you have already done so, we kindly request you to share your feedback on the issue raised.
If a solution has been provided, we would appreciate it if you could share it with the community and mark it as the accepted solution. This will help others facing similar challenges and benefit the broader community.

Thank you.

Element115 · ‎04-15-2025

@v-pnaroju-msft

yes and they are still working on it. waiting for them to get back to me.

v-pnaroju-msft · ‎04-08-2025

Hi Element115,

Thank you for your follow-up. I apologise for any confusion caused by the initial response.

Based on my understanding, your configuration appears to be correct as per the details shared. The issue seems to arise from unexpected behaviour within the Copy Data activity, rather than any user-related error.

Considering the irregular duplication such as 8 rows being copied instead of the expected 4, without any consistent pattern as shown in your recent screenshot it appears to be a platform-related behaviour. As you have already raised a support ticket, we kindly request you to continue collaborating closely with the support team. Please share detailed logs and observations with them to aid in a comprehensive investigation.

We would appreciate it if you could keep us informed about the resolution of the support ticket, as it may prove beneficial for the wider Fabric community.

If you find our response helpful, we request you to mark it as the accepted solution and consider giving kudos. This will help other community members who may encounter similar queries.

Thank you.

v-pnaroju-msft · ‎04-06-2025

Hi @Element115,

Thank you for reaching out through the Microsoft Fabric Community Forum.

We sincerely appreciate you reporting this issue and sharing your detailed observations. We understand how frustrating it can be to encounter unexpected duplicate records in your pipeline, especially when it is designed to be deterministic. Your expectation of consistent results from incremental loads is absolutely valid.

Based on our understanding, this behaviour could be occurring due to factors such as:
• Parallel writes in the Copy Data activity,
• Improper use of the Append mode without handling duplicates,
• Inconsistent incremental load filters (for example, missing watermarks), or
• Absence of constraints (such as primary keys) on the destination.

As you have already raised a support ticket, the engineering team will analyse the backend logs and provide specific insights related to your account and possible resolutions.

Kindly keep us informed of any findings, as they may prove helpful to other members of the community.

If you find our response helpful, we request you to mark it as the accepted solution and consider giving kudos. This will assist other community members who may have similar queries.

Thank you once again.

Element115 · ‎04-07-2025

@v-pnaroju-msft

1. What parallel writes? I didn't change the default setting, which is Auto, as shown below in the screenshot.

But are you saying, that with the default Auto setting, Copy Data is free to start writing to the lakehouse table with more than 1 thread AND mistakenly DOES NOT keep track of the rows one thread already wrote or is in the process of writing to the table?

2. What do you mean by "Improper use of the Append mode without handling duplicates"? How could there be an improper use of Append or Overwrite modes when the UI only gives you the choice of selecting one or the other and your pipeline logic depends on a T-SQL script using a column of unique IDs to SELECT rows coming after the last ingested ID (greater than operation), thus the result set should be a set of sequential rows according to the ID identity column imported from the source.

3. What does that even mean? " Inconsistent incremental load filters (for example, missing watermarks)." Watermarks? What watermarks? And again: the incremental filter is based on an identity column of unique IDs. Nothing inconsistent here.

4. There are no primary key constraints on a lakehouse delta table. Or did I miss something and this is also a feature like in a SQL DB?

Furthermore, I let the pipeline run on an hourly schedule since last Friday. I did a check today for duplicate rows in the destination table. Here is the screenshot showing both the T-SQL script used and the result. Please note the count is 8 but should be 4. Also note the times and the interval between times for each occurrence. There is no fixed pattern.

QUESTION::PIPELINE::COPY DATA::DUPLICATE DATA COPY

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

QUESTION::PIPELINE::COPY DATA::DUPLICATE DATA COPY

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026