Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
frithjof_v
Continued Contributor
Continued Contributor

Time travel - Not recommended for long-term backup solution

Hi,

 

This article https://learn.microsoft.com/en-us/azure/databricks/delta/history says that

 

"Databricks does not recommend using Delta Lake table history as a long-term backup solution for data archival."

 

What are the reasons why it is not recommended to use table history as a long-term backup?

 

Time travel seems like a really convenient functionality 😀 I would like to learn more about the reasons why it is not recommended for long-term backup solution for data archival.

 

Thank you! 😀

4 REPLIES 4
v-gchenna-msft
Community Support
Community Support

Hi @frithjof_v ,

Thanks for using Fabric Community.
As per my understanding these are two main reasons why Databricks recommends a shorter retention period (7 days by default) for Delta Lake table history and advises against using it for long-term backups:

  1. Storage Costs: Every version of your data created through modifications to the Delta table is stored in the table history. This can quickly become expensive as data accumulates over time. Long-term backups would require storing a significant amount of historical data, leading to high storage costs.
  2. Performance Impact: Maintaining a large history can impact the performance of Delta Lake operations. Here's how:
    • VACUUM Operation: VACUUM is a process that cleans up old versions of files in Delta Lake to reclaim storage space. With a large history, the VACUUM operation becomes more complex and time-consuming.
    • Time Travel Queries: While time travel is convenient, querying historical versions requires accessing the relevant data files. A vast history increases the number of files to potentially scan, slowing down queries.


As you know Databricks originally developed the Delta Lake and continues to actively contribute to the open source project, they are only offically recommending this -
Work with Delta Lake table history | Databricks on AWS

vgchennamsft_0-1713802945268.png


Some useful links -
Delta Tables- Advanced Concepts 

Hope this is helpful. Please let me know incase of further queries.

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if you got some insights on your query?
Otherwise, will respond back with the more details and we will try to help .

Thanks

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if you got some insights on your query?
Otherwise, will respond back with the more details and we will try to help .

Thanks

Hi @frithjof_v,

Response from internal Team -

A primary reason for not recommending time travel for long term archival is that the older versions of the data/files are stored in the same storage location as the current data and hence 1. Is prone to human error deletions and accidental drops and 2. Maintaining tables with large data volumes as older versions is costly as compared to alternative archival methods like Azure Storage cold tier.



Hope this is helpful.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.