<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to perform a MERGE using Spark SQL without listing every column or using &amp;quot;blind updates in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5147223#M15804</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/486347"&gt;@Lodha_Jaydeep&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the detailed info. !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had used the PySpark method for a different project and it worked.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was specifically checking if there is anything in Spark SQL as most of the notebooks are in Spark SQL And also there are fields naming, business logic applied during the Merge which I prefer to do in Spark SQL.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you again !&lt;/P&gt;</description>
    <pubDate>Sun, 12 Apr 2026 14:13:26 GMT</pubDate>
    <dc:creator>lavginqo3</dc:creator>
    <dc:date>2026-04-12T14:13:26Z</dc:date>
    <item>
      <title>How to perform a MERGE using Spark SQL without listing every column or using "blind updates</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146197#M15779</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I’m trying to sync a Bronze table to a Silver table in a Lakehouse. My source is a full overwrite, so I need to handle inserts, updates, and deletes (specifically records that "vanish" from the source).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The Setup:&lt;/STRONG&gt; I tried the following standard MERGE logic:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SQL&lt;/P&gt;&lt;P&gt;MERGE INTO silver AS t&lt;BR /&gt;USING bronze AS s&lt;BR /&gt;ON t.id = s.id&lt;BR /&gt;WHEN MATCHED AND &amp;lt;condition_to_prevent_blind_update&amp;gt; THEN&lt;BR /&gt;UPDATE SET *&lt;BR /&gt;WHEN NOT MATCHED BY TARGET THEN&lt;BR /&gt;INSERT *&lt;BR /&gt;WHEN NOT MATCHED BY SOURCE THEN&lt;BR /&gt;DELETE&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;The Goal&lt;/STRONG&gt;: I want to avoid blind updates (updating rows where data hasn't actually changed) to save on compute and time. This is the whole purpose of my optimization, as the table is quite wide.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The Problem:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I do not want to manually list 50+ columns in the WHEN MATCHED condition for maintenance reasons.&lt;/P&gt;&lt;P&gt;I tried hash(t.*) != hash(s.*), but Spark SQL does not allow the * wildcard expansion for the target table (t) inside the MERGE condition. It throws a [DELTA_MERGE_UNRESOLVED_EXPRESSION] error.&lt;/P&gt;&lt;P&gt;Question: Is there a straightforward way in Spark SQL to compare the entire row of the target (t) and source (s) without explicitly naming every column? Or is the only way to save on compute/time to use a PySpark wrapper that dynamically generates the hash/column string?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 22:05:33 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146197#M15779</guid>
      <dc:creator>lavginqo3</dc:creator>
      <dc:date>2026-04-09T22:05:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform a MERGE using Spark SQL without listing every column or using "blind updates</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146569#M15793</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1386867"&gt;@lavginqo3&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;Thanks for reaching fabric community. I am happy to respond you and writing to fix issue you are facing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a Spark SQL parse-time limitation. When Spark compiles a MERGE statement, it processes the&amp;nbsp;WHEN MATCHED&amp;nbsp;condition&amp;nbsp;&lt;STRONG&gt;before&lt;/STRONG&gt;&amp;nbsp;it resolves the schemas of the tables involved. At that point,&amp;nbsp;t.*&amp;nbsp;is just a raw token. Spark doesn't yet know what columns&amp;nbsp;t&amp;nbsp;contains, so it cannot expand the wildcard. The engine throws the unresolved expression error and aborts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Note:&amp;nbsp;&lt;/STRONG&gt;hash(s.*)&amp;nbsp;works fine inside a subquery or a regular SELECT the restriction is specific to the MERGE condition clause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The fix PySpark dynamic hash (no column listing required)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This generates the condition string dynamically so you never manually list columns. maintenance-free even as the schema evolves.&lt;/P&gt;&lt;P&gt;The solution is to let Python expand the columns before the SQL string is ever sent to Spark. By the time Spark sees the query, every column is named explicitly no wildcards, no error. You do need to list each column explicitly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Example:&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Step 1: fetch column list from schema — auto-updates if columns change
cols = [c for c in spark.table("bronze").columns if c != "id"]

# Step 2: build hash expressions for both sides
source_hash = f"hash({', '.join(f's.{c}' for c in cols)})"
target_hash = f"hash({', '.join(f't.{c}' for c in cols)})"

# Step 3: generate and run the MERGE
merge_sql = f"""
MERGE INTO silver AS t
USING bronze AS s
ON t.id = s.id
WHEN MATCHED AND {source_hash} != {target_hash}
  THEN UPDATE SET *
WHEN NOT MATCHED BY TARGET THEN INSERT *
WHEN NOT MATCHED BY SOURCE THEN DELETE
"""

spark.sql(merge_sql)&lt;/LI-CODE&gt;&lt;P&gt;What Spark actually receives is already fully explicit. &lt;STRONG&gt;e.g.&amp;nbsp;hash(s.name, s.age, s.city, ...) != hash(t.name, t.age, t.city, ...)&lt;/STRONG&gt; so there are no wildcards to resolve and no error is thrown.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the cleanest approach you get full SQL MERGE semantics with zero column maintenance. The hash covers all non-key columns automatically.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Why this is maintenance-free&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;spark.table("bronze").columns&amp;nbsp;always reflects the live schema. If you add, remove, or rename columns in your Bronze table, the hash condition updates automatically on the next run. you never touch this script.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If you find this response helpful, kindly consider marking it as the accepted solution and giving it a kudos. This helps others facing similar issues and is greatly appreciated.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 10:48:10 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146569#M15793</guid>
      <dc:creator>Lodha_Jaydeep</dc:creator>
      <dc:date>2026-04-10T10:48:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform a MERGE using Spark SQL without listing every column or using "blind updates</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146752#M15798</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here below is the Fabric best practice approach:-&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;Compute a deterministic row hash in Bronze ingestion:&lt;DIV&gt;&lt;P&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;CREATE OR REPLACE TABLE bronze_hashed&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; *,&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; xxhash64(to_json(named_struct(&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'col1', col1,&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'col2', col2,&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 'col3', col3&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -- include all business columns EXCEPT metadata&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ))) AS row_hash&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;FROM bronze;&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;Add a persisted hash column to Silver:&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;ALTER TABLE silver ADD COLUMN IF NOT EXISTS row_hash BIGINT;&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;&lt;DIV&gt;Use a hash‑based MERGE:&lt;DIV&gt;&lt;P&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;MERGE INTO silver AS t&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;USING bronze_hashed AS s&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;ON t.id = s.id&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;WHEN MATCHED AND t.row_hash &amp;lt;&amp;gt; s.row_hash THEN&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; UPDATE SET *&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;WHEN NOT MATCHED BY TARGET THEN&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; INSERT *&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;WHEN NOT MATCHED BY SOURCE THEN&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; DELETE;&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Ati Puri&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1386867"&gt;@lavginqo3&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 14:30:25 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5146752#M15798</guid>
      <dc:creator>ati_puri</dc:creator>
      <dc:date>2026-04-10T14:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform a MERGE using Spark SQL without listing every column or using "blind updates</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5147223#M15804</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/486347"&gt;@Lodha_Jaydeep&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the detailed info. !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had used the PySpark method for a different project and it worked.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was specifically checking if there is anything in Spark SQL as most of the notebooks are in Spark SQL And also there are fields naming, business logic applied during the Merge which I prefer to do in Spark SQL.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you again !&lt;/P&gt;</description>
      <pubDate>Sun, 12 Apr 2026 14:13:26 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-perform-a-MERGE-using-Spark-SQL-without-listing-every/m-p/5147223#M15804</guid>
      <dc:creator>lavginqo3</dc:creator>
      <dc:date>2026-04-12T14:13:26Z</dc:date>
    </item>
  </channel>
</rss>

