Solved: Re: Fabric Billable Storage After Vacuum

bw_chec · ‎09-25-2024

I have been working in my Fabric capacity for 2 months now, on a fairly small (10GB) dataset that overwrites every night (due to the limitations on source database).

I ran a maintenance script yesterday to go through my delta tables, optimise them and vacuum them > 7 days.

Today I have looked in the Metrics App at the current or billable storage metrics haven't gone down, despite ~100-150GB of data being cleared yesterday.

Any thoughts on why? Would like to understand the Metrics App better and best practices for keeping storage costs down.

Best,

BW

frithjof_v · ‎09-26-2024

Perhaps it's the soft delete.

Check again after 7 days, perhaps the storage has dropped then.

https://learn.microsoft.com/en-us/fabric/onelake/onelake-disaster-recovery#soft-delete-for-onelake-f...

View solution in original post

v-jingzhan-msft · ‎09-25-2024

Hi @bw_chec

Here are some potential reasons:

Sometimes, the Metrics App might not update immediately. It could take some time for the changes to reflect in the storage metrics. You might check again after a few hours or the next day to see if the metrics have updated.
There might be background processes or operations that are still holding onto the storage space. For example, certain operations might not release the storage immediately even after vacuuming. It's worth checking if there are any ongoing background processes that could be affecting the storage metrics.

Best Practices for Storage Cost Management:

Regular Maintenance: Continue to optimize and vacuum your delta tables regularly. This helps in keeping the storage usage efficient.
Monitor Usage Trends: Use the Metrics App to monitor usage trends and identify any unusual spikes or patterns in storage consumption. This can help in pinpointing areas that need attention.
Optimize Data Storage: Ensure that your data storage is optimized by removing unnecessary data and compressing data where possible. This can significantly reduce storage costs.
Review and Adjust Capacity: Regularly review your capacity usage and adjust it based on your needs. This can help in avoiding over-provisioning and reducing costs.

Reference: OneLake capacity consumption example - Microsoft Fabric | Microsoft Learn

Hope this would be helpful!

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

bw_chec · ‎09-26-2024

Hi @v-jingzhan-msft ,

Thanks for responding.

It has been 2 days and the storage still hasn't gone down in the metrics app, even though I can see a load of parquet files have been deleted.

How can I find which background processes are holding onto the data?

When I look at the Utilisation graph I can see The CU % is constantly sitting at > 50% background %.

When I explore these timepoints it says the processes running are notebook/pipeline runs that I had run hours before?

Thanks,

Ben

v-jingzhan-msft · ‎10-08-2024

Hi @bw_chec

It has been more than 7 days, do you see the storage goes down?

For the CU % usage line, it's probably due to that Fabric smooths the CU usage of background operations that have long runtimes and consume heavy CU loads. You could learn more from the documentation: Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn

Balance between performance and reliability

Fabric is designed to deliver lightning-fast performance to its customers by allowing operations to access more capacity unit (CU) resources than are allocated to the capacity. Tasks that might take several minutes to complete on other platforms can be finished in mere seconds on Fabric. To avoid penalizing users when operational loads surge, Fabric smooths or averages the CU usage of an operation over a minimum of five minutes, and even longer for high CU usage but short runtime requests. This behavior ensures you can enjoy consistently fast performance without experiencing throttling.

For background operations that have long runtimes and consume heavy CU loads, Fabric smooths their CU usage over a 24-hour period. Smoothing eliminates the need for data scientists and database administrators to spend time creating job schedules to spread CU load across the day to prevent accounts from freezing. With 24-hour CU smoothing, scheduled jobs can all run simultaneously without causing any spikes at any time during the day, and you can enjoy consistently fast performance without wasting time managing job schedules.

Best Regards,
Jing

bw_chec · ‎10-09-2024

Hi,

Yes the storage went down after around 7 days. Thanks for your help!

frithjof_v · ‎09-26-2024

Perhaps it's the soft delete.

Check again after 7 days, perhaps the storage has dropped then.

https://learn.microsoft.com/en-us/fabric/onelake/onelake-disaster-recovery#soft-delete-for-onelake-f...

lbendlin · ‎09-26-2024

The current default is 28 days but starting May 2024 we are transitioning to a 7-day default retention period

Gives a whole new meaning to "current"...

Fabric Billable Storage After Vacuum

Helpful resources

Join us at the Microsoft Fabric Community Conference

Join our Community Sticker Challenge 2025

Fabric Monthly Update - January 2025

Fabric Community Update - February 2025

Join us at the 2025 Microsoft Fabric Community Conference