Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply

SCD Type 2 Using MERGE in delta

Whats the best practice when I want to both update AND insert a record into a target table in one operation ensureing atomisity?

 

newRecord

ID, Product, Color

123, A, Blue

 

myTable

ID, Product, Color, is_current

123, A, Yellow, TRUE

 

I basically want to do an atomic operation that gives the following result in myTable

 

ID, Product, Color, is_current

123, A, Yellow, FALSE

123, A, Blue, TRUE

 

I basically want to do a .whenMatchedUpdate AND a .whenMatchedInsert in a MERGE statement, but I know thats not supported.

 

 

1 ACCEPTED SOLUTION

Hello @AndersASorensen 

 

give it a try

 

from pyspark.sql.functions import *
from delta.tables import *

# Assuming 'target_table' is your existing Delta table
target_table = DeltaTable.forName(spark, "target_table")

# 'source_df' is your new data
merge_condition = "target.natural_key = source.natural_key"

(target_table.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set={
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.whenNotMatchedInsert(values={
"natural_key": "source.natural_key",
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.execute())

 

 

https://iterationinsights.com/article/implementing-a-hybrid-type-1-and-2-slowly-changing-dimension-i...

 

Hope this helps.

 

Please accept the answer if this works. 

Thanks

 

 

View solution in original post

4 REPLIES 4
nilendraFabric
Community Champion
Community Champion

Hello @AndersASorensen 

The `MERGE` statement is not natively supported in Microsoft Fabric’s Data Warehouse as of now. Users need to rely on PySpark or SQL workarounds for similar functionality

 

from delta.tables import DeltaTable

# Define source and target
target_table = DeltaTable.forName(spark, "Lakehouse.TargetTable")
source_df = spark.read.table("Lakehouse.SourceTable")

# Perform the merge
target_table.alias("target").merge(
source=source_df.alias("source"),
condition="target.key = source.key"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

 

https://aventius.co.uk/2024/03/28/microsoft-fabric-using-pyspark-to-dynamically-merge-data-into-many...

 

Currently, the `MERGE` statement is not supported in Fabric’s Data Warehouses. As a workaround:
• Use a combination of `ROW_NUMBER()` for deduplication and `INSERT`/`UPDATE` statements to achieve similar functionality:

WITH DeduplicatedSource AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LOAD_DATE DESC) AS row_num
FROM SourceTable
WHERE row_num = 1
)
-- Merge logic
MERGE INTO TargetTable AS target
USING DeduplicatedSource AS source
ON target.ID = source.ID
WHEN MATCHED THEN UPDATE SET target.Column = source.Column
WHEN NOT MATCHED THEN INSERT (Column) VALUES (source.Column);

Hope this helps 

Thanks

I am happy to use pyspark, and I want to do it on a lakehouse, not a DW.

 

However, your .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() suggestion will not work as this will only update the matched row.

 

Yes, I would like it to update a is_current column when matched, but it will not insert the record as a new row too.

Hello @AndersASorensen 

 

give it a try

 

from pyspark.sql.functions import *
from delta.tables import *

# Assuming 'target_table' is your existing Delta table
target_table = DeltaTable.forName(spark, "target_table")

# 'source_df' is your new data
merge_condition = "target.natural_key = source.natural_key"

(target_table.alias("target")
.merge(source_df.alias("source"), merge_condition)
.whenMatchedUpdate(set={
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.whenNotMatchedInsert(values={
"natural_key": "source.natural_key",
"attribute1": "source.attribute1",
"attribute2": "source.attribute2",
"is_current": lit(True),
"update_date": current_date()
})
.execute())

 

 

https://iterationinsights.com/article/implementing-a-hybrid-type-1-and-2-slowly-changing-dimension-i...

 

Hope this helps.

 

Please accept the answer if this works. 

Thanks

 

 

Hi again 🙂

 

The code provided won't work, but the article you linked to had a working solution 🙂

Thanks!

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.