Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at the 2025 Microsoft Fabric Community Conference. March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for $400 discount. Register now

Reply
arpost
Kudo Collector
Kudo Collector

How do you delete specific rows from a Lakehouse table?

Greetings, all. I have some files loaded into a Lakehouse that I've also loaded into a Lakehouse table. I want to delete some of the rows in the Lakehouse table, but I can't seem to find a way to do that. How does one go about doing this?

 

 

Any help appreciated.

1 ACCEPTED SOLUTION
v-nikhilan-msft
Community Support
Community Support

Hi @arpost ,
Thanks for using Fabric Community.

It is possible to delete rows in a Lakehouse table using Spark-SQL in notebooks. I have created a repro of the scenario. I have attached the screenshots for your reference.
Trying to delete the row which has value of 46134

vnikhilanmsft_0-1696958603394.png

Query succeeded
vnikhilanmsft_1-1696958624474.png

After deletion of the row:
vnikhilanmsft_2-1696958682038.png


You can refer this link for more information: Link1 
Hope this helps. Do let us know if you have any further issues. Glad to help.

View solution in original post

14 REPLIES 14
YOUNG99
Frequent Visitor

@CalebCarter927 

Give the below a try.  I won't take credit for it.  It was given to me by a MSFT trainer when I proposed the question.

 

----------------------------------------------------------

%%pyspark

df = spark.sql("SELECT MAX(DataDate) MDate FROM Lakehouse.Table")

MDate = df.agg({"MDate":"max"}).collect()[0]['max(MDate)']
#print(MDate)

sqlStr = "DELETE From Lakehouse.Table WHERE DataDate < '"+str(MDate)+"'"
#print(sqlStr)

#dfDistinct = spark.sql("SELECT DISTINCT DataDate FROM Lakehouse.Table")
#display(dfDistinct)

dml = spark.sql(sqlStr)

Right, "<", ">" and "=" type of comparisons work.

Problem is only with the word "IN".

TimoRiikonen
Helper V
Helper V

It doesn't work in a programmatic manner.

If you have pre-defined solution, then you can make the SQL command.
But if you for example create a table of keys and try to delete those rows, that is not possible atm.

 

Couple of options:
- Delete everything and recreate all of it without the lines you wanted to delete.

- Use Power Query filtering in dataflow and make the dataflow to overwrite the data.

TimoRiikonen
Helper V
Helper V

I have the same problem: subqueries are not supported in delete.

I tried to create a temporary table and make a join, but that failed as well.

Since I have only two values, I can hard code my case for now.

Original deletion query:

DELETE FROM DE_LH_200_SILVER_Default.ServiceRequest WHERE updated NOT IN (select distinct DATE(updated) from DE_LH_100_BRONZE_Default.ServiceRequest)

Failed attempt with temp table and join:

CREATE TABLE deletecontent 
(
    todelete DATE
);
INSERT INTO deletecontent
    select distinct DATE(updated) from DE_LH_200_SILVER_Default.ServiceRequest WHERE updated NOT IN (select distinct DATE(updated) from DE_LH_100_BRONZE_Default.ServiceRequest);
SELECT * FROM deletecontent;
SELECT DISTINCT DATE(updated) FROM DE_LH_200_SILVER_Default.ServiceRequest INNER JOIN deletecontent ON deletecontent.todelete = DE_LH_200_SILVER_Default.ServiceRequest.updated;
DELETE FROM DE_LH_200_SILVER_Default.ServiceRequest SR INNER JOIN deletecontent ON deletecontent.todelete = DE_LH_200_SILVER_Default.ServiceRequest.updated;

 

 

 

arpost
Kudo Collector
Kudo Collector

Appreciate it, @v-nikhilan-msft! I was hoping there was a way to do it through a UI, but that works.

Hi @arpost ,
Glad that your query got resolved. Please continue using Fabric Community for help on your queries.

v-nikhilan-msft
Community Support
Community Support

Hi @arpost ,
Thanks for using Fabric Community.

It is possible to delete rows in a Lakehouse table using Spark-SQL in notebooks. I have created a repro of the scenario. I have attached the screenshots for your reference.
Trying to delete the row which has value of 46134

vnikhilanmsft_0-1696958603394.png

Query succeeded
vnikhilanmsft_1-1696958624474.png

After deletion of the row:
vnikhilanmsft_2-1696958682038.png


You can refer this link for more information: Link1 
Hope this helps. Do let us know if you have any further issues. Glad to help.

I went exactly the same way but it doesn´t work for me. Are there any bugs at the moment?
Everything is fine without the where clause (deleting all the data) but not with a where clause, regardless if it´s a clause to text or date column.

Any suggestions?

It seems to be something with timeout - when I delete just 150 rows, everything works fine!

DELETE FROM `Table`
WHERE attribute = 'ABC';

But when I change the attribute die 'BCA', which is many thousands of rows in the table, it crashes with the following message:
INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME
I don´t see any connection to datetime in this? 


Happy for any idea...

These discussions should be still valid for Lakehouses (and probably Warehouses) without schema.

I heard that Lakehouse with schema has more deletion issues.

Is this the case?

No, I´m using a Lakehouse without schema.

How are trying to delete have you tried the pyspark method above?

I'm new to SparkSQL.  Would anyone happen to know how to acheive the following SQL statement?  I get the error:  Error: Subqueries are not supported in the DELETE

 

%%SQL

DELETE FROM <Table>
WHERE [ImportDate] < (SELECT MAX([ImportDate]) FROM <Table>)

 

Thanks in advance

I'm trying to do something similar too. Did you find a solution? Thanks

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebFBC_Carousel

Fabric Monthly Update - February 2025

Check out the February 2025 Fabric update to learn about new features.

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.

Top Kudoed Authors