The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredAsk the Fabric Databases & App Development teams anything! Live on Reddit on August 26th. Learn more.
Hi,
I have a delta table that needs updating with new data every 10 mins. The data that comes in has many fields but maybe only 1 of them is actually different from the target table.
As I need to track these changes, I don't want to update the whole row when most of the fields are the same.
I am using the SQL command below to do this, but getting the following error:
Subqueries are not supported in the UPDATE condition of MERGE operation (condition = exists(t.label, s.label)). org.apache.spark.sql.delta.DeltaErrorsBase.subqueryNotSupportedException(DeltaErrors.scala:1127)
Is this a PySpark/Fabric problem? Is there an alternative way to do this?
The following code reproduces the problem (output path and target table need changing):
In this example, only the 1st row would be updated and only the "label" column would be.
I have realised that "EXCEPT" is not supported in Delta yet. How can I solve this problem?
I want to only update the columns that have changed, not all of the columns.
Hi @BW_RFA ,
I can suggest these document to refer as it might help you -
Upsert into a Delta Lake table using merge | Databricks on AWS
Why Upsert in Databricks using Pyspark is a bad idea, or is it ? | by Debayan Kar | Medium
Hello @BW_RFA ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Hi @BW_RFA ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
I have attached code at the bottom of the topic to reproduce the issue.
df1 is to be used to build the initial table to replicate the table that i have, and df2 is to replicate the update file that would come in.
The sample output should be an updated table that has only updated row 1 label and NOTHING else. Not updating the whole of row 1 with the data that's in df2 row 1 as the "static" field hasn't changed.
Hi @BW_RFA ,
Thanks for using Fabric Community.
Can you please share few more details of your task? like sample_input and sample_output and on what conditions we need to perform a merge!
I can guide you better if you can share these details!
User | Count |
---|---|
16 | |
10 | |
8 | |
4 | |
3 |
User | Count |
---|---|
53 | |
20 | |
20 | |
17 | |
12 |