<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp;amp; u in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3699347#M1426</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/267775" target="_blank"&gt;@Vogels&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 14 Feb 2024 16:52:11 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-02-14T16:52:11Z</dc:date>
    <item>
      <title>Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp; update</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3688207#M1423</link>
      <description>&lt;P&gt;I have a really large table that I like to update with only the new or changed entries.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have two tables (one with all data &amp;amp; one with the new to be updated data) and a basic code similar to (&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/merge" target="_blank" rel="noopener"&gt;Upsert into a Delta Lake table using merge - Azure Databricks | Microsoft Learn)&amp;nbsp;&lt;/A&gt;which ran fine for a few days.&amp;nbsp;But then started to give the error:&amp;nbsp;Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;I checked the tables and there are really no duplicates in either of them.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;I completely deleted the table with the new to be updated data from the lakehouse, copied the data back into the lakehouse and then it worked again.&lt;BR /&gt;Therefore I think it has something to with duplicated log records, because I found this:&amp;nbsp;&lt;SPAN&gt;A common ETL use case is to collect logs into Delta table by appending them to a table. However, often the sources can generate duplicate log records and downstream deduplication steps are needed to take care of them.&amp;nbsp;&lt;/SPAN&gt;(&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/merge#---data-deduplication-when-writing-into-delta-tables" target="_self"&gt;Data deduplication when writing into delta table - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;So I was thinking to merge our code for updating the large table with the code for deduplicating and then I should be good. However Im not so skilled in SQL yet and can't figure out how to combine them. Is someone able to help? Below our simplified code and the deduplicate example.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Simplified Code:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;MERGE&lt;/SPAN&gt; &lt;SPAN&gt;INTO&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;Large_Table&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; target&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;USING&lt;/SPAN&gt;&lt;SPAN&gt; Large_Table_Previous_Days &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; source&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;ON&lt;/SPAN&gt;&lt;SPAN&gt; (target.ID&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; source.ID)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN&lt;/SPAN&gt; &lt;SPAN&gt;MATCHED&lt;/SPAN&gt; &lt;SPAN&gt;THEN&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;UPDATE&lt;/SPAN&gt; &lt;SPAN&gt;SET&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; target.LOAD_DATE &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; source.LOAD_DATE,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; target.ID&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; source.ID,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; target.Content&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; source.Content&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN&lt;/SPAN&gt; &lt;SPAN&gt;NOT&lt;/SPAN&gt; &lt;SPAN&gt;MATCHED&lt;/SPAN&gt; &lt;SPAN&gt;THEN&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;INSERT&lt;/SPAN&gt;&lt;SPAN&gt; (LOAD_DATE, ID, Content)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;VALUES&lt;/SPAN&gt;&lt;SPAN&gt; (source.LOAD_DATE, source.ID, source.Content)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Deduplicate example from microsoft learn:&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;MERGE&lt;/SPAN&gt; &lt;SPAN class=""&gt;INTO&lt;/SPAN&gt; &lt;SPAN class=""&gt;logs&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;USING&lt;/SPAN&gt; newDedupedLogs &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;ON&lt;/SPAN&gt; logs.uniqueId = newDedupedLogs.uniqueId &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;WHEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;NOT&lt;/SPAN&gt; &lt;SPAN class=""&gt;MATCHED&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;SPAN class=""&gt;THEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;INSERT&lt;/SPAN&gt; *&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 08 Feb 2024 11:42:52 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3688207#M1423</guid>
      <dc:creator>Vogels</dc:creator>
      <dc:date>2024-02-08T11:42:52Z</dc:date>
    </item>
    <item>
      <title>Re: Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp; u</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3690598#M1424</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/267775"&gt;@Vogels&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Thanks for using Fabric Community.&lt;BR /&gt;You are doing right.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Instead of copying and deleting the table, &lt;/SPAN&gt;perform deduplication on&lt;STRONG&gt;&lt;CODE&gt;Large_Table_Previous_Days&lt;/CODE&gt;&lt;/STRONG&gt;before merging&lt;SPAN&gt;. This ensures unique rows are fed into the &lt;/SPAN&gt;&lt;CODE&gt;MERGE&lt;/CODE&gt;&lt;SPAN&gt; statement. You can achieve this using the &lt;/SPAN&gt;&lt;CODE&gt;ROW_NUMBER&lt;/CODE&gt;&lt;SPAN&gt; function to assign a unique sequence number based on a specific order:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days

-- Use only the first row (latest LOAD_DATE) for each ID in the merge
WHERE row_num = 1
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN&gt;Modify your &lt;/SPAN&gt;&lt;CODE&gt;MERGE&lt;/CODE&gt;&lt;SPAN&gt; statement to use the deduplicated &lt;/SPAN&gt;&lt;CODE&gt;Large_Table_Previous_Days&lt;/CODE&gt;&lt;SPAN&gt; as the source:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;-- Deduplicate Large_Table_Previous_Days
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM Large_Table_Previous_Days
WHERE row_num = 1

-- Merge deduplicated data into Large_Table
MERGE INTO Large_Table AS target
USING (
  -- Deduplicated result from the previous step
) AS source
ON (target.ID = source.ID)

WHEN MATCHED THEN
  UPDATE SET
    target.LOAD_DATE = source.LOAD_DATE,
    target.ID = source.ID,
    target.Content = source.Content

WHEN NOT MATCHED THEN
  INSERT (LOAD_DATE, ID, Content)
  VALUES (source.LOAD_DATE, source.ID, source.Content)
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Replace &lt;CODE class=""&gt;Large_Table&lt;/CODE&gt;, &lt;CODE class=""&gt;Large_Table_Previous_Days&lt;/CODE&gt;, and the column names (&lt;CODE class=""&gt;ID&lt;/CODE&gt;, &lt;CODE class=""&gt;LOAD_DATE&lt;/CODE&gt;, &lt;CODE class=""&gt;Content&lt;/CODE&gt;) with your actual table and column names.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P data-sourcepos="30:1-30:39"&gt;&lt;SPAN&gt;This code performs the following steps:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL data-sourcepos="32:1-32:45"&gt;
&lt;LI data-sourcepos="32:1-32:45"&gt;&lt;STRONG&gt;Deduplicates&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;Large_Table_Previous_Days&lt;/CODE&gt;&lt;/STRONG&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;
&lt;UL data-sourcepos="33:5-34:90"&gt;
&lt;LI data-sourcepos="33:5-33:118"&gt;&lt;SPAN&gt;It assigns a unique&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;row_num&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;to each row within each group of the same&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;ID&lt;/CODE&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;ordered by descending&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;LOAD_DATE&lt;/CODE&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI data-sourcepos="34:5-34:90"&gt;&lt;SPAN&gt;Only the first row (latest&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;LOAD_DATE&lt;/CODE&gt;&lt;SPAN&gt;) for each&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;ID&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;is kept,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;ensuring unique data.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-sourcepos="35:1-40:0"&gt;&lt;STRONG&gt;Merges deduplicated data&lt;/STRONG&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;
&lt;UL data-sourcepos="36:5-40:0"&gt;
&lt;LI data-sourcepos="36:5-36:79"&gt;&lt;SPAN&gt;The deduplicated result is used as the&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;source&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;in the&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;MERGE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;statement.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI data-sourcepos="37:5-37:69"&gt;&lt;CODE class=""&gt;ON&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;clause matches rows based on the same&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;ID&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;in both tables.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI data-sourcepos="38:5-38:60"&gt;&lt;CODE class=""&gt;WHEN MATCHED&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;updates existing rows in&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;Large_Table&lt;/CODE&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI data-sourcepos="39:5-40:0"&gt;&lt;CODE class=""&gt;WHEN NOT MATCHED&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;inserts new rows from the&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;source&lt;/CODE&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-sourcepos="41:1-41:126"&gt;&lt;SPAN&gt;This combined code should prevent duplicate row errors and ensure only new or changed entries are updated in your large table.&lt;BR /&gt;&lt;BR /&gt;While this is one approach , if you want to&amp;nbsp;perform an&amp;nbsp;&lt;STRONG&gt;upsert (update or insert)&lt;/STRONG&gt;&amp;nbsp;operation on a table, you can use Dataflow Gen2 here. Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination. The source will be the&amp;nbsp;Large_Table_Previous_Days and the target table will be&amp;nbsp;Large_Table. For more information refer to this document:&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-dataflows-gen2" target="_blank"&gt;https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-dataflows-gen2&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Hope this helps. Please let us know if you have any further queries.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Feb 2024 10:51:33 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3690598#M1424</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-09T10:51:33Z</dc:date>
    </item>
    <item>
      <title>Re: Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp; u</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3695367#M1425</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/267775"&gt;@Vogels&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help. &lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Feb 2024 04:18:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3695367#M1425</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-13T04:18:22Z</dc:date>
    </item>
    <item>
      <title>Re: Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp; u</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3699347#M1426</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/267775" target="_blank"&gt;@Vogels&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Feb 2024 16:52:11 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3699347#M1426</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-14T16:52:11Z</dc:date>
    </item>
    <item>
      <title>Re: Update table with SQL MERGE issue duplicates: help needed with combining SQL deduplicate &amp; u</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3707725#M1427</link>
      <description>&lt;P&gt;Hi&amp;nbsp;@Anonymous,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much for the detailed answer. It works perfectly now.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2024 10:31:56 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Update-table-with-SQL-MERGE-issue-duplicates-help-needed-with/m-p/3707725#M1427</guid>
      <dc:creator>Vogels</dc:creator>
      <dc:date>2024-02-19T10:31:56Z</dc:date>
    </item>
  </channel>
</rss>

