Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified as a Fabric Data Engineer: Check your eligibility for a 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700. Get started

Reply
ebjim
Helper IV
Helper IV

Deleting from lakehouse table based on values in another lakehous table

I have 2 tables in the same lakehouse and one of them (existingData) has rows that need to be deleted, based on what's in the other table (newData). Whatever IDs found in newData are the rows that need to be deleted from existingData. The two tables have identical structure, but the IDs in newData are not continuous, so I cannot delete based on range. Also, subqueries are not supported in Spark SQL. Delete queries using inner join didn't work. I ended up using this PySpark loop but it's slow:

 

LakehouseName = "Lakehouse"
TableOri = "existingData"
TableDest = "newData"
ColName = "ID"
ToDelete = spark.sql("SELECT {0} FROM {1}.{2}".format(ColName,LakehouseName,TableDest)).collect()
for row in ToDelete:
    spark.sql("DELETE FROM {0}.{1} WHERE {2} = {3}".format(LakehouseName, TableOri, ColName, row[0]))
 
Any ideas for something quicker? Thanks in advance!
1 ACCEPTED SOLUTION

Hi @ebjim 
I tried to repro the same scenario from my side. I have 2 tables with same schema in my lakehouse :

1) List with 40 rows
2) Listings_3 with 45 rows.
I tried to delete the 40 rows in Listings_3 with the below code:

vnikhilanmsft_2-1708375429698.png

 



The rows got successfully deleted. SQL MERGE is faster when compared to other DML operations. By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed.

Please try this approach and do let me know if you have any further questions. Hope this helps.

View solution in original post

2 REPLIES 2
v-nikhilan-msft
Community Support
Community Support

Hi @ebjim 
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks

Hi @ebjim 
I tried to repro the same scenario from my side. I have 2 tables with same schema in my lakehouse :

1) List with 40 rows
2) Listings_3 with 45 rows.
I tried to delete the 40 rows in Listings_3 with the below code:

vnikhilanmsft_2-1708375429698.png

 



The rows got successfully deleted. SQL MERGE is faster when compared to other DML operations. By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed.

Please try this approach and do let me know if you have any further questions. Hope this helps.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

JanFabricDE_carousel

Fabric Monthly Update - January 2025

Explore the power of Python Notebooks in Fabric!

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.