Update delta table only with changed fields

BW_RFA · ‎03-16-2024

Hi,

I have a delta table that needs updating with new data every 10 mins. The data that comes in has many fields but maybe only 1 of them is actually different from the target table.

As I need to track these changes, I don't want to update the whole row when most of the fields are the same.

I am using the SQL command below to do this, but getting the following error:

Subqueries are not supported in the UPDATE condition of MERGE operation (condition = exists(t.label, s.label)). org.apache.spark.sql.delta.DeltaErrorsBase.subqueryNotSupportedException(DeltaErrors.scala:1127)

Is this a PySpark/Fabric problem? Is there an alternative way to do this?

The following code reproduces the problem (output path and target table need changing):

In this example, only the 1st row would be updated and only the "label" column would be.

df1 = spark.createDataFrame(

[

(1, "foo", "static1"),

(2, "bar", "static2"),

],

["id", "label", "staticCol"]

)

df2 = spark.createDataFrame(

[

(1, "update", "static1"),

(2, "bar", "static2"),

],

["id", "label", "staticCol"]

)

df2.createOrReplaceTempView("streamView")

output_path = 'Tables/TestUpdate'

df1.write.mode('overwrite').format('delta').save(output_path)

%%sql

MERGE INTO Lakehouse_Dev.TestUpdate t

USING streamView s

ON t.label = s.label

WHEN MATCHED AND EXISTS

(

SELECT t.label, t.staticCol

EXCEPT

(SELECT s.label, s.staticCol)

)

THEN

UPDATE SET

t.label = s.label

t.staticCol= s.staticCol

WHEN NOT MATCHED BY TARGET

THEN

INSERT

(label, staticCol)

VALUES

(s.label, s.staticCol)

BW_RFA · ‎03-18-2024

I have realised that "EXCEPT" is not supported in Delta yet. How can I solve this problem?

I want to only update the columns that have changed, not all of the columns.

Anonymous · ‎03-20-2024

Hi @BW_RFA ,

I can suggest these document to refer as it might help you -
Upsert into a Delta Lake table using merge | Databricks on AWS
Why Upsert in Databricks using Pyspark is a bad idea, or is it ? | by Debayan Kar | Medium

Anonymous · ‎03-21-2024

Hello @BW_RFA ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Anonymous · ‎03-24-2024

Hi @BW_RFA ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

BW_RFA · ‎03-18-2024

I have attached code at the bottom of the topic to reproduce the issue.

df1 is to be used to build the initial table to replicate the table that i have, and df2 is to replicate the update file that would come in.

The sample output should be an updated table that has only updated row 1 label and NOTHING else. Not updating the whole of row 1 with the data that's in df2 row 1 as the "static" field hasn't changed.

Anonymous · ‎03-18-2024

Hi @BW_RFA ,

Thanks for using Fabric Community.

Can you please share few more details of your task? like sample_input and sample_output and on what conditions we need to perform a merge!

I can guide you better if you can share these details!

Update delta table only with changed fields

Helpful resources

Fabric Monthly Update - September 2025

Fabric Community Update - August 2025

FabCon is coming to Atlanta

Update delta table only with changed fields

Helpful resources

Fabric Monthly Update - September 2025

Fabric Community Update - August 2025