March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I have a really large table that I like to update with only the new or changed entries.
I have two tables (one with all data & one with the new to be updated data) and a basic code similar to (Upsert into a Delta Lake table using merge - Azure Databricks | Microsoft Learn) which ran fine for a few days. But then started to give the error: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.
So I was thinking to merge our code for updating the large table with the code for deduplicating and then I should be good. However Im not so skilled in SQL yet and can't figure out how to combine them. Is someone able to help? Below our simplified code and the deduplicate example.
Simplified Code:
Solved! Go to Solution.
Hi @Vogels
Thanks for using Fabric Community.
You are doing right.
Instead of copying and deleting the table, perform deduplication onLarge_Table_Previous_Days
before merging. This ensures unique rows are fed into the MERGE
statement. You can achieve this using the ROW_NUMBER
function to assign a unique sequence number based on a specific order:
-- Deduplicate Large_Table_Previous_Days
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
-- Use only the first row (latest LOAD_DATE) for each ID in the merge
WHERE row_num = 1
Modify your MERGE
statement to use the deduplicated Large_Table_Previous_Days
as the source:
-- Deduplicate Large_Table_Previous_Days
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
WHERE row_num = 1
-- Merge deduplicated data into Large_Table
MERGE INTO Large_Table AS target
USING (
-- Deduplicated result from the previous step
) AS source
ON (target.ID = source.ID)
WHEN MATCHED THEN
UPDATE SET
target.LOAD_DATE = source.LOAD_DATE,
target.ID = source.ID,
target.Content = source.Content
WHEN NOT MATCHED THEN
INSERT (LOAD_DATE, ID, Content)
VALUES (source.LOAD_DATE, source.ID, source.Content)
Replace Large_Table
, Large_Table_Previous_Days
, and the column names (ID
, LOAD_DATE
, Content
) with your actual table and column names.
This code performs the following steps:
Large_Table_Previous_Days
:
row_num
to each row within each group of the same ID
, ordered by descending LOAD_DATE
.LOAD_DATE
) for each ID
is kept, ensuring unique data.source
in the MERGE
statement.ON
clause matches rows based on the same ID
in both tables.WHEN MATCHED
updates existing rows in Large_Table
.WHEN NOT MATCHED
inserts new rows from the source
.This combined code should prevent duplicate row errors and ensure only new or changed entries are updated in your large table.
While this is one approach , if you want to perform an upsert (update or insert) operation on a table, you can use Dataflow Gen2 here. Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination. The source will be the Large_Table_Previous_Days and the target table will be Large_Table. For more information refer to this document:
https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...
Hope this helps. Please let us know if you have any further queries.
Hi @Vogels
Thanks for using Fabric Community.
You are doing right.
Instead of copying and deleting the table, perform deduplication onLarge_Table_Previous_Days
before merging. This ensures unique rows are fed into the MERGE
statement. You can achieve this using the ROW_NUMBER
function to assign a unique sequence number based on a specific order:
-- Deduplicate Large_Table_Previous_Days
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
-- Use only the first row (latest LOAD_DATE) for each ID in the merge
WHERE row_num = 1
Modify your MERGE
statement to use the deduplicated Large_Table_Previous_Days
as the source:
-- Deduplicate Large_Table_Previous_Days
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
WHERE row_num = 1
-- Merge deduplicated data into Large_Table
MERGE INTO Large_Table AS target
USING (
-- Deduplicated result from the previous step
) AS source
ON (target.ID = source.ID)
WHEN MATCHED THEN
UPDATE SET
target.LOAD_DATE = source.LOAD_DATE,
target.ID = source.ID,
target.Content = source.Content
WHEN NOT MATCHED THEN
INSERT (LOAD_DATE, ID, Content)
VALUES (source.LOAD_DATE, source.ID, source.Content)
Replace Large_Table
, Large_Table_Previous_Days
, and the column names (ID
, LOAD_DATE
, Content
) with your actual table and column names.
This code performs the following steps:
Large_Table_Previous_Days
:
row_num
to each row within each group of the same ID
, ordered by descending LOAD_DATE
.LOAD_DATE
) for each ID
is kept, ensuring unique data.source
in the MERGE
statement.ON
clause matches rows based on the same ID
in both tables.WHEN MATCHED
updates existing rows in Large_Table
.WHEN NOT MATCHED
inserts new rows from the source
.This combined code should prevent duplicate row errors and ensure only new or changed entries are updated in your large table.
While this is one approach , if you want to perform an upsert (update or insert) operation on a table, you can use Dataflow Gen2 here. Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination. The source will be the Large_Table_Previous_Days and the target table will be Large_Table. For more information refer to this document:
https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...
Hope this helps. Please let us know if you have any further queries.
Hi @Vogels
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks
Hi @Vogels
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
6 | |
2 | |
1 | |
1 | |
1 |
User | Count |
---|---|
10 | |
3 | |
3 | |
2 | |
2 |