Re: How to delete a delta file in Lakehouse?

JayJay11 · ‎01-30-2024

Hi all,

I have a simple use-case: A Gen2 Dataflow prepares a table and loads it to a delta table in a Lakehouse using the append mechanism, i.e. each time the dataflow runs a new parquet file is created.

Now let's say I need to perform some housekeeping, e.g. one of the append loads has a problem and needs to be deleted. How do I do that without re-creating the entire delta table? I know I can delete files under "Files", but how do I delete one or many parquet files of which a delta table is based on?

Anonymous · ‎01-31-2024

Hi @JayJay11 ,

Thanks for using Fabric Community.
Do you want to delete the latest changes/records added in delta table that was added when dataflow runs?
If that was your ask you can actually refer this - Work with Delta Lake table history - Azure Databricks | Microsoft Learn

Incase if your ask is different, please share your issue with more details so that I can guide you.

JayJay11 · ‎01-31-2024

Thank you for your quick reply. Though I am not sure I understand the answer, as the link does not tell me how to actually delete one of those files. Is this even possible? If yes, do I need to run code or how to do it?

Anonymous · ‎01-31-2024

Hi @JayJay11 ,

Inorder to delete the latest changes taken place to delta table or restore to older version.
You can try using Fabric Notebooks and Delta table time travel feature -

Inorder to list out the versions available:

DESCRIBE HISTORY gopi_test_lakehouse.sales__1_test

Inorder to preview the records of previous version:

SELECT count(*) FROM gopi_test_lakehouse.sales__1_test VERSION AS OF 6

Inorder to restore to previous version - Restoring the table makes the changes permanent.

RESTORE TABLE gopi_test_lakehouse.sales__1_test TO VERSION AS OF 6

Inorder to delete older unused version of delta table you can use VACUUM - Remove unused data files with vacuum | Databricks on AWS

Please feel free to refer below documents:
Delta Lake Time Travel | Delta Lake
apache spark - How to roll back delta table to previous version - Stack Overflow
Table utility commands — Delta Lake Documentation

Hope this is helpful. Please let me know incase of further queries.

JayJay11 · ‎01-31-2024

Thank you very much, it becomes clearer now to me.

So in order to restore an older version, I use RESTORE and reference the version I want to go back to. Then I must use VACUUM to delete files, which are unused now, due to the RESTORE.

When I try to use VACUUM though, I get an error:

%%sql
VACUUM bi_buildings

Error

Error: Job aborted due to stage failure: Task 23 in stage 159.0 failed 4 times, most recent failure: Lost task 23.3 in stage 159.0 (TID 3133) (vm-12a14818 executor 2): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getDeletionVectorRelativePath(VacuumCommand.scala:537) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getDeletionVectorRelativePath$(VacuumCommand.scala:522) at org.apache.spark.sql.delta.commands.VacuumCommand$.getDeletionVectorRelativePath(VacuumCommand.scala:51) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getValidRelativePathsAndSubdirs(VacuumCommand.scala:513) at org.apache.spark.sql.delta.commands.VacuumCommandImpl.getValidRelativePathsAndSubdirs$(VacuumCommand.scala:500) at org.apache.spark.sql.delta.commands.VacuumCommand$.getValidRelativePathsAndSubdirs(VacuumCommand.scala:51) at org.apache.spark.sql.delta.commands.VacuumCommand$.$anonfun$gc$9(VacuumCommand.scala:153) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Driver stacktrace: org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2799) org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2735) org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2734) scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2734) org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1218) org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1218) scala.Option.foreach(Option.scala:407) org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1218) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2998) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2937) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2926) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

Anonymous · ‎01-31-2024

Hi @JayJay11 ,

This might be due to Delta Lake Table Corruption: The table's metadata or files might be corrupted, causing issues with the vacuum operation.

Try to create a new table and work with Vaccum command.

I didn't encouter any issues while working with VACUUM

Hope this is helpful.

Anonymous · ‎01-31-2024

Hi @JayJay11 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

JayJay11 · ‎01-31-2024

Hello @Anonymous

no I could not resolve it, as soon as I run VACCUM I get the above noted error. I am not sure what i means that the delta files are corrupted.

Also, wenn I run DESCRIBE HISTORY again after the RESTORE, I still see the old versions. Is this because the VACCUM was not successful?

It is a bit concerning, that there is such a random error and why would the files suddenly be corrupted?

Anonymous · ‎02-01-2024

Hi @JayJay11 ,

I am not sure what could be the reason for this performance.
I can suggest this - Use table maintenance feature to manage delta tables in Fabric - Microsoft Fabric | Microsoft Learn

All your old data >7 days (minimum) will gets cleaned.

Incase if you are facing some challenge with this delta table, try to to create a new delta table and copy this data to it so that you willl not face this issue.

Hope this is helpful. Please let me know incase of further queries.

Anonymous · ‎02-01-2024

Hello @JayJay11 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .

How to delete a delta file in Lakehouse?

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

How to delete a delta file in Lakehouse?

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025