Time travel - Not recommended for long-term backup...

frithjof_v · ‎04-20-2024

Hi,

This article https://learn.microsoft.com/en-us/azure/databricks/delta/history says that

"Databricks does not recommend using Delta Lake table history as a long-term backup solution for data archival."

What are the reasons why it is not recommended to use table history as a long-term backup?

Time travel seems like a really convenient functionality 😀 I would like to learn more about the reasons why it is not recommended for long-term backup solution for data archival.

Thank you! 😀

v-gchenna-msft · ‎04-22-2024

Hi @frithjof_v ,

Thanks for using Fabric Community.
As per my understanding these are two main reasons why Databricks recommends a shorter retention period (7 days by default) for Delta Lake table history and advises against using it for long-term backups:

Storage Costs: Every version of your data created through modifications to the Delta table is stored in the table history. This can quickly become expensive as data accumulates over time. Long-term backups would require storing a significant amount of historical data, leading to high storage costs.
Performance Impact: Maintaining a large history can impact the performance of Delta Lake operations. Here's how:
- VACUUM Operation: VACUUM is a process that cleans up old versions of files in Delta Lake to reclaim storage space. With a large history, the VACUUM operation becomes more complex and time-consuming.
- Time Travel Queries: While time travel is convenient, querying historical versions requires accessing the relevant data files. A vast history increases the number of files to potentially scan, slowing down queries.

As you know Databricks originally developed the Delta Lake and continues to actively contribute to the open source project, they are only offically recommending this -
Work with Delta Lake table history | Databricks on AWS

Some useful links -
Delta Tables- Advanced Concepts

Hope this is helpful. Please let me know incase of further queries.

v-gchenna-msft · ‎04-25-2024

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if you got some insights on your query?
Otherwise, will respond back with the more details and we will try to help .

Thanks

v-gchenna-msft · ‎04-26-2024

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if you got some insights on your query?
Otherwise, will respond back with the more details and we will try to help .

Thanks

v-gchenna-msft · ‎05-06-2024

Hi @frithjof_v,

Response from internal Team -

A primary reason for not recommending time travel for long term archival is that the older versions of the data/files are stored in the same storage location as the current data and hence 1. Is prone to human error deletions and accidental drops and 2. Maintaining tables with large data volumes as older versions is costly as compared to alternative archival methods like Azure Storage cold tier.

Hope this is helpful.