Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
datakulture
Frequent Visitor

Issue with Old Parquet Files Persisting After INSERT OVERWRITE in MS Fabric Warehouse

Hi team,
I hope this email finds you well.
I am encountering an issue in MS Fabric Warehouse where old Parquet files are not being deleted after executing an INSERT OVERWRITE ( Delete & Insert) operation. Instead of replacing the existing data, the new files are created alongside the old ones, resulting in redundant storage.
Could you please investigate this issue and clarify:
  1. Is this the expected behavior for INSERT OVERWRITE in the Warehouse?
  2. If not, how can we ensure that old Parquet files are cleaned up automatically after an overwrite?
  3. Are there any recommended workarounds or configurations to address this issue?
Let me know if you need further details or logs to assist with the investigation.
12 REPLIES 12
datakulture
Frequent Visitor

Hello, everyone,

Any updates on this? My One Lake Storage is growing enormously every day, and we need a solution as soon as possible. I look forward to hearing from you.

Thanks!

v-jingzhan-msft
Community Support
Community Support

Thank @govindarajan_d for the valuable insights and document link. 

 

Here is my understanding:

This is expected behavior from the perspective of delta tables and parquet files. Because the parquet file cannot be modified, all operations on the data, including inserts, deletes, and updates, will generate new parquet files instead of updating the existing parquet file. Meanwhile, due to the time travel function, old parquet files cannot be deleted, making it possible to query prior versions of data

 

Currently, the data retention for time travel queries in Fabric Warehouses is thirty calendar days. However Fabric doesn't allow us to modify the retention duration for Warehouses. 

 

Best Regards,
Jing
Community Support Team

Hi Jing,

We have data older than 30 days, which is causing the enormous growth in OneLake. Please find the below screenshot for your references.

 

datakulture_0-1733917082101.png

 

@datakulture How are you saying this particular folder is from Aug? The folders that you see most likely would be partitions. And a file generated in Aug doesn't mean that it is up for deletion. Let's say you deleted some records or did some OPTIMIZE on top of a delta table that reduced the number of files, if you run VACUUM and if the file is older than retention period, then it will be deleted. 

 

But if you still believe your delta table is not working properly, I would recommend creating a MS support ticket. 

@govindarajan_d

Firstly, since it's a Fabric Warehouse, VACUUM is completely ruled out.

I am performing a truncate-and-load operation for the particular table, and the folder belongs to that specific table. Auto-deletion should occur for files older than 30 days, but that is not happening currently. Let me reach out to Microsoft for a resolution. Thanks.

@datakulture Yeah I understand VACUUM is not possible manually, but what I meant was the automatic run process. If it is a truncate and load operation, then definitely the previous version files should be there if it is older than 30 days. 

Thanks for your valuable insights.

The data is still persisting in Delta tables for more than 30 days. Please find the screenshot below for your reference. The August 2024 data is still exists in the one lake.

 

datakulture_0-1733465752838.png

 

@govindarajan_d  Question - How are you saying this particular folder is from Aug? Please refer above screenshot.

govindarajan_d
Solution Supplier
Solution Supplier

Hi @datakulture,

 

Delta tables have version mechanism and that is how it supports this feature - Time travel in Warehouse within Microsoft Fabric - Microsoft Fabric | Microsoft Learn.

 

Using time travel feature, you can query and clone data as it was in the past with a limit of upto 30 days. So the retention time is 30 days. While in Lakehouse you can manually run VACUUM to clear up the old files with whichever retention period you need but in Warehouse currently it is not possible. 

Kanna123
Regular Visitor

Hi @lbendlin 

This is a warehousing problem only. I'm having the identical storage problem with my ongoing project.

 

lbendlin
Super User
Super User

Are you sure this is a Warehouse issue?  

 

Microsoft Fabric Lakehouse OPTIMIZE and VACUUM for Table Maintenance

Yes it's a MS Fabric warehouse issue.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.