This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
Hello,
I’m trying to sync a Bronze table to a Silver table in a Lakehouse. My source is a full overwrite, so I need to handle inserts, updates, and deletes (specifically records that "vanish" from the source).
The Setup: I tried the following standard MERGE logic:
SQL
MERGE INTO silver AS t
USING bronze AS s
ON t.id = s.id
WHEN MATCHED AND <condition_to_prevent_blind_update> THEN
UPDATE SET *
WHEN NOT MATCHED BY TARGET THEN
INSERT *
WHEN NOT MATCHED BY SOURCE THEN
DELETE
The Goal: I want to avoid blind updates (updating rows where data hasn't actually changed) to save on compute and time. This is the whole purpose of my optimization, as the table is quite wide.
The Problem:
I do not want to manually list 50+ columns in the WHEN MATCHED condition for maintenance reasons.
I tried hash(t.*) != hash(s.*), but Spark SQL does not allow the * wildcard expansion for the target table (t) inside the MERGE condition. It throws a [DELTA_MERGE_UNRESOLVED_EXPRESSION] error.
Question: Is there a straightforward way in Spark SQL to compare the entire row of the target (t) and source (s) without explicitly naming every column? Or is the only way to save on compute/time to use a PySpark wrapper that dynamically generates the hash/column string?
Thanks !
Solved! Go to Solution.
Hi @lavginqo3 ,
Thanks for reaching fabric community. I am happy to respond you and writing to fix issue you are facing.
This is a Spark SQL parse-time limitation. When Spark compiles a MERGE statement, it processes the WHEN MATCHED condition before it resolves the schemas of the tables involved. At that point, t.* is just a raw token. Spark doesn't yet know what columns t contains, so it cannot expand the wildcard. The engine throws the unresolved expression error and aborts.
Note: hash(s.*) works fine inside a subquery or a regular SELECT the restriction is specific to the MERGE condition clause.
The fix PySpark dynamic hash (no column listing required)
This generates the condition string dynamically so you never manually list columns. maintenance-free even as the schema evolves.
The solution is to let Python expand the columns before the SQL string is ever sent to Spark. By the time Spark sees the query, every column is named explicitly no wildcards, no error. You do need to list each column explicitly.
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
# Step 1: fetch column list from schema — auto-updates if columns change
cols = [c for c in spark.table("bronze").columns if c != "id"]
# Step 2: build hash expressions for both sides
source_hash = f"hash({', '.join(f's.{c}' for c in cols)})"
target_hash = f"hash({', '.join(f't.{c}' for c in cols)})"
# Step 3: generate and run the MERGE
merge_sql = f"""
MERGE INTO silver AS t
USING bronze AS s
ON t.id = s.id
WHEN MATCHED AND {source_hash} != {target_hash}
THEN UPDATE SET *
WHEN NOT MATCHED BY TARGET THEN INSERT *
WHEN NOT MATCHED BY SOURCE THEN DELETE
"""
spark.sql(merge_sql)What Spark actually receives is already fully explicit. e.g. hash(s.name, s.age, s.city, ...) != hash(t.name, t.age, t.city, ...) so there are no wildcards to resolve and no error is thrown.
This is the cleanest approach you get full SQL MERGE semantics with zero column maintenance. The hash covers all non-key columns automatically.
Why this is maintenance-free
spark.table("bronze").columns always reflects the live schema. If you add, remove, or rename columns in your Bronze table, the hash condition updates automatically on the next run. you never touch this script.
If you find this response helpful, kindly consider marking it as the accepted solution and giving it a kudos. This helps others facing similar issues and is greatly appreciated.
Hi ,
Here below is the Fabric best practice approach:-
Thanks
Ati Puri
Hi @lavginqo3 ,
Thanks for reaching fabric community. I am happy to respond you and writing to fix issue you are facing.
This is a Spark SQL parse-time limitation. When Spark compiles a MERGE statement, it processes the WHEN MATCHED condition before it resolves the schemas of the tables involved. At that point, t.* is just a raw token. Spark doesn't yet know what columns t contains, so it cannot expand the wildcard. The engine throws the unresolved expression error and aborts.
Note: hash(s.*) works fine inside a subquery or a regular SELECT the restriction is specific to the MERGE condition clause.
The fix PySpark dynamic hash (no column listing required)
This generates the condition string dynamically so you never manually list columns. maintenance-free even as the schema evolves.
The solution is to let Python expand the columns before the SQL string is ever sent to Spark. By the time Spark sees the query, every column is named explicitly no wildcards, no error. You do need to list each column explicitly.
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
# Step 1: fetch column list from schema — auto-updates if columns change
cols = [c for c in spark.table("bronze").columns if c != "id"]
# Step 2: build hash expressions for both sides
source_hash = f"hash({', '.join(f's.{c}' for c in cols)})"
target_hash = f"hash({', '.join(f't.{c}' for c in cols)})"
# Step 3: generate and run the MERGE
merge_sql = f"""
MERGE INTO silver AS t
USING bronze AS s
ON t.id = s.id
WHEN MATCHED AND {source_hash} != {target_hash}
THEN UPDATE SET *
WHEN NOT MATCHED BY TARGET THEN INSERT *
WHEN NOT MATCHED BY SOURCE THEN DELETE
"""
spark.sql(merge_sql)What Spark actually receives is already fully explicit. e.g. hash(s.name, s.age, s.city, ...) != hash(t.name, t.age, t.city, ...) so there are no wildcards to resolve and no error is thrown.
This is the cleanest approach you get full SQL MERGE semantics with zero column maintenance. The hash covers all non-key columns automatically.
Why this is maintenance-free
spark.table("bronze").columns always reflects the live schema. If you add, remove, or rename columns in your Bronze table, the hash condition updates automatically on the next run. you never touch this script.
If you find this response helpful, kindly consider marking it as the accepted solution and giving it a kudos. This helps others facing similar issues and is greatly appreciated.
Hi @Lodha_Jaydeep ,
Thanks for the detailed info. !
I had used the PySpark method for a different project and it worked.
I was specifically checking if there is anything in Spark SQL as most of the notebooks are in Spark SQL And also there are fields naming, business logic applied during the Merge which I prefer to do in Spark SQL.
Thank you again !
Check out the April 2026 Fabric update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
| User | Count |
|---|---|
| 11 | |
| 10 | |
| 6 | |
| 4 | |
| 4 |
| User | Count |
|---|---|
| 23 | |
| 16 | |
| 13 | |
| 9 | |
| 7 |