Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
JayJay11
Resolver II
Resolver II

How to delete a delta file in Lakehouse?

Hi all,

 

I have a simple use-case: A Gen2 Dataflow prepares a table and loads it to a delta table in a Lakehouse using the append mechanism, i.e. each time the dataflow runs a new parquet file is created.

 

Now let's say I need to perform some housekeeping, e.g. one of the append loads has a problem and needs to be deleted. How do I do that without re-creating the entire delta table? I know I can delete files under "Files", but how do I delete one or many parquet files of which a delta table is based on?

 

How to delete delta files.png

 

 

9 REPLIES 9
Anonymous
Not applicable

Hi @JayJay11 ,

Thanks for using Fabric Community.
Do you want to delete the latest changes/records added in delta table that was added when dataflow runs?
If that was your ask you can actually refer this - Work with Delta Lake table history - Azure Databricks | Microsoft Learn

Incase if your ask is different, please share your issue with more details so that I can guide you.

Thank you for your quick reply. Though I am not sure I understand the answer, as the link does not tell me how to actually delete one of those files. Is this even possible? If yes, do I need to run code or how to do it?

Anonymous
Not applicable

Hi @JayJay11 ,

Inorder to delete the latest changes taken place to delta table or restore to older version. 
You can try using Fabric Notebooks and Delta table time travel feature -

Inorder to list out the versions available:

 

DESCRIBE HISTORY gopi_test_lakehouse.sales__1_test

 



vgchennamsft_0-1706694745388.png


Inorder to preview the records of previous version:

 

SELECT count(*) FROM gopi_test_lakehouse.sales__1_test VERSION AS OF 6

 



vgchennamsft_1-1706695065641.png


Inorder to restore to previous version - Restoring the table makes the changes permanent.

 

RESTORE TABLE gopi_test_lakehouse.sales__1_test TO VERSION AS OF 6

 

vgchennamsft_2-1706695206431.png


Inorder to delete older unused version of delta table you can use VACUUM - Remove unused data files with vacuum | Databricks on AWS

Please feel free to refer below documents:
Delta Lake Time Travel | Delta Lake
apache spark - How to roll back delta table to previous version - Stack Overflow
Table utility commands — Delta Lake Documentation

Hope this is helpful. Please let me know incase of further queries.

Thank you very much, it becomes clearer now to me.

 

So in order to restore an older version, I use RESTORE and reference the version I want to go back to. Then I must use VACUUM to delete files, which are unused now, due to the RESTORE.

 

When I try to use VACUUM though, I get an error:

 

%%sql
VACUUM bi_buildings

 

Error

Error: Job aborted due to stage failure: Task 23 in stage 159.0 failed 4 times, most recent failure: Lost task 23.3 in stage 159.0 (TID 3133) (vm-12a14818 executor 2): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getDeletionVectorRelativePath(VacuumCommand.scala:537) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getDeletionVectorRelativePath$(VacuumCommand.scala:522) at org.apache.spark.sql.delta.commands.VacuumCommand$.getDeletionVectorRelativePath(VacuumCommand.scala:51) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getValidRelativePathsAndSubdirs(VacuumCommand.scala:513) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getValidRelativePathsAndSubdirs$(VacuumCommand.scala:500) at org.apache.spark.sql.delta.commands.VacuumCommand$.getValidRelativePathsAndSubdirs(VacuumCommand.scala:51) at org.apache.spark.sql.delta.commands.VacuumCommand$.$anonfun$gc$9(VacuumCommand.scala:153) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Driver stacktrace: org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2799) org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2735) org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2734) scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2734) org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1218) org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1218) scala.Option.foreach(Option.scala:407) org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1218) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2998) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2937) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2926) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

Anonymous
Not applicable

Hi @JayJay11 ,

This might be due to Delta Lake Table Corruption: The table's metadata or files might be corrupted, causing issues with the vacuum operation.

Try to create a new table and work with Vaccum command.

I didn't encouter any issues while working with VACUUM

vgchennamsft_0-1706699079116.png


Hope this is helpful.

 

Anonymous
Not applicable

Hi @JayJay11 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Hello @Anonymous 

 

no I could not resolve it, as soon as I run VACCUM I get the above noted error. I am not sure what i means that the delta files are corrupted.

 

Also, wenn I run DESCRIBE HISTORY again after the RESTORE, I still see the old versions. Is this because the VACCUM was not successful?

 

It is a bit concerning, that there is such a random error and why would the files suddenly be corrupted?

Anonymous
Not applicable

Hi @JayJay11 ,

I am not sure what could be the reason for this performance.
I can suggest this - Use table maintenance feature to manage delta tables in Fabric - Microsoft Fabric | Microsoft Learn

All your old data >7 days (minimum) will gets cleaned.

Incase if you are facing some challenge with this delta table, try to to create a new delta table and copy this data to it so that you willl not face this issue.

Hope this is helpful. Please let me know incase of further queries.

Anonymous
Not applicable

Hello @JayJay11 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.