Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Vogels
Frequent Visitor

Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate & update

I have a really large table that I like to update with only the new or changed entries. 

I have two tables (one with all data & one with the new to be updated data) and a basic code similar to (Upsert into a Delta Lake table using merge - Azure Databricks | Microsoft Learn) which ran fine for a few days. But then started to give the error: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table. 

 

  1. I checked the tables and there are really no duplicates in either of them. 
  2. I completely deleted the table with the new to be updated data from the lakehouse, copied the data back into the lakehouse and then it worked again.
    Therefore I think it has something to with duplicated log records, because I found this: A common ETL use case is to collect logs into Delta table by appending them to a table. However, often the sources can generate duplicate log records and downstream deduplication steps are needed to take care of them. (Data deduplication when writing into delta table - Azure Databricks | Microsoft Learn)

So I was thinking to merge our code for updating the large table with the code for deduplicating and then I should be good. However Im not so skilled in SQL yet and can't figure out how to combine them. Is someone able to help? Below our simplified code and the deduplicate example. 

 

Simplified Code: 

MERGE INTO Large_Table AS target
USING Large_Table_Previous_Days AS source
ON (target.ID = source.ID)

WHEN MATCHED THEN
  UPDATE SET
    target.LOAD_DATE = source.LOAD_DATE,
    target.ID = source.ID,
    target.Content = source.Content
 
WHEN NOT MATCHED THEN
  INSERT (LOAD_DATE, ID, Content)
    VALUES (source.LOAD_DATE, source.ID, source.Content)
 
Deduplicate example from microsoft learn: 
MERGE INTO logs
USING newDedupedLogs
ON logs.uniqueId = newDedupedLogs.uniqueId
WHEN NOT MATCHED
  THEN INSERT *
1 ACCEPTED SOLUTION
v-nikhilan-msft
Community Support
Community Support

Hi @Vogels 
Thanks for using Fabric Community.
You are doing right. 

Instead of copying and deleting the table, perform deduplication onLarge_Table_Previous_Daysbefore merging. This ensures unique rows are fed into the MERGE statement. You can achieve this using the ROW_NUMBER function to assign a unique sequence number based on a specific order:

 

-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days

-- Use only the first row (latest LOAD_DATE) for each ID in the merge
WHERE row_num = 1

 

 
Modify your MERGE statement to use the deduplicated Large_Table_Previous_Days as the source:

 

-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
WHERE row_num = 1

-- Merge deduplicated data into Large_Table
MERGE INTO Large_Table AS target
USING (
  -- Deduplicated result from the previous step
) AS source
ON (target.ID = source.ID)

WHEN MATCHED THEN
  UPDATE SET
    target.LOAD_DATE = source.LOAD_DATE,
    target.ID = source.ID,
    target.Content = source.Content

WHEN NOT MATCHED THEN
  INSERT (LOAD_DATE, ID, Content)
  VALUES (source.LOAD_DATE, source.ID, source.Content)

 

Replace Large_Table, Large_Table_Previous_Days, and the column names (ID, LOAD_DATE, Content) with your actual table and column names.

This code performs the following steps:

  1. Deduplicates Large_Table_Previous_Days:
    • It assigns a unique row_num to each row within each group of the same ID, ordered by descending LOAD_DATE.
    • Only the first row (latest LOAD_DATE) for each ID is kept, ensuring unique data.
  2. Merges deduplicated data:
    • The deduplicated result is used as the source in the MERGE statement.
    • ON clause matches rows based on the same ID in both tables.
    • WHEN MATCHED updates existing rows in Large_Table.
    • WHEN NOT MATCHED inserts new rows from the source.

This combined code should prevent duplicate row errors and ensure only new or changed entries are updated in your large table.

While this is one approach , if you want to perform an upsert (update or insert) operation on a table, you can use Dataflow Gen2 here. Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination. The source will be the Large_Table_Previous_Days and the target table will be Large_Table. For more information refer to this document:
https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...


Hope this helps. Please let us know if you have any further queries.

 

View solution in original post

4 REPLIES 4
v-nikhilan-msft
Community Support
Community Support

Hi @Vogels 
Thanks for using Fabric Community.
You are doing right. 

Instead of copying and deleting the table, perform deduplication onLarge_Table_Previous_Daysbefore merging. This ensures unique rows are fed into the MERGE statement. You can achieve this using the ROW_NUMBER function to assign a unique sequence number based on a specific order:

 

-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days

-- Use only the first row (latest LOAD_DATE) for each ID in the merge
WHERE row_num = 1

 

 
Modify your MERGE statement to use the deduplicated Large_Table_Previous_Days as the source:

 

-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
WHERE row_num = 1

-- Merge deduplicated data into Large_Table
MERGE INTO Large_Table AS target
USING (
  -- Deduplicated result from the previous step
) AS source
ON (target.ID = source.ID)

WHEN MATCHED THEN
  UPDATE SET
    target.LOAD_DATE = source.LOAD_DATE,
    target.ID = source.ID,
    target.Content = source.Content

WHEN NOT MATCHED THEN
  INSERT (LOAD_DATE, ID, Content)
  VALUES (source.LOAD_DATE, source.ID, source.Content)

 

Replace Large_Table, Large_Table_Previous_Days, and the column names (ID, LOAD_DATE, Content) with your actual table and column names.

This code performs the following steps:

  1. Deduplicates Large_Table_Previous_Days:
    • It assigns a unique row_num to each row within each group of the same ID, ordered by descending LOAD_DATE.
    • Only the first row (latest LOAD_DATE) for each ID is kept, ensuring unique data.
  2. Merges deduplicated data:
    • The deduplicated result is used as the source in the MERGE statement.
    • ON clause matches rows based on the same ID in both tables.
    • WHEN MATCHED updates existing rows in Large_Table.
    • WHEN NOT MATCHED inserts new rows from the source.

This combined code should prevent duplicate row errors and ensure only new or changed entries are updated in your large table.

While this is one approach , if you want to perform an upsert (update or insert) operation on a table, you can use Dataflow Gen2 here. Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination. The source will be the Large_Table_Previous_Days and the target table will be Large_Table. For more information refer to this document:
https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...


Hope this helps. Please let us know if you have any further queries.

 

Hi @Vogels 
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @Vogels 
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @v-nikhilan-msft,

 

Thank you very much for the detailed answer. It works perfectly now. 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.