Solved: Re: Backup data in Lakehouse and warehouse

KimTutein · ‎05-26-2025

Hi community

I have an internal audit asking us “what will you do if hackers take over our workspace” and we cannot get to it.

We thought that for our lakehouse we might backup our tables and files in an blob storage which could be isolated from our main subscription. We manged to write a small notebook to simply copy the delta parquet tables to a blob. We are also able to restore these files to a new lakehouse. We are able to query the tables using sql endpoint – however spark will not accept the tables via. For instance spark.sql() commands. It seems the problem is with registration og the tables in spark metastore which will not accept the delta spark files has just been copied into the tables section.

Anyone that knows if we might make this registration using spark somehow or would we need to query an sql endpoint and recreate the tables from this?
Anyone has any thoughts on backup of data in lakehouses and/or warehouse. In warehouse we cannot see the delta parquet files to my best of knowledge – so in this case it would be pipelines which to me seem very inefficient?

Best Regards

Kim Tutein

v-bmanikante · ‎05-27-2025

Hi @KimTutein ,

Thanks for the update, that's a really useful discovery. You're right about the _metadata folder. It's used by some systems to store extra info, but it's not needed for Spark to read Delta tables as long as the _delta_log and data files are there. So excluding it makes sense, and it's great to hear that copying the tables without it worked for you. Your idea of doing a quick restore of the tables and then slowly rebuilding the metadata using Spark (by reading and writing the tables) is a smart way to handle a disaster recovery. It gives you fast access to the data while allowing you to fix the table metadata over time. We think your approach is practical and aligns well with how Delta Lake is designed to work. Thanks for sharing , we’ll consider including this in our internal recovery strategy as well.

It looks like your problem has been solved, please mark the helpful reply and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster .

Thank you very much for your kind cooperation!

Please don't forget to give a "Kudos " – I’d truly appreciate it!

Regards,

B Manikanteswara Reddy

View solution in original post

v-bmanikante · ‎05-27-2025

Hi @KimTutein ,

Thank you for reaching out to Microsoft Fabric Community.

It looks like the issue you're facing is very similar to one that another user encountered earlier. That issue has been resolved, and the solution might be helpful in your case as well.

Solved: How to take backups for Lakehouse ,warehouse and p... - Microsoft Fabric Community

Lakehouse/Warehouse backups - Microsoft Fabric Community

You can copy Delta Lake tables (including _delta_log) to an isolated Azure Blob Storage (ideally in a different subscription or tenant).Maintain a manifest (JSON/table) of the table names and paths for recovery. Delta format (Parquet + log) is portable and supports full recovery. Blob storage can be made immutable (WORM policy) to prevent tampering.

Use Spark SQL to re-register tables:

CREATE TABLE my_table
USING DELTA
LOCATION 'abfss://<container>@<account>.dfs.core.windows.net/path/to/table'

SQL Warehouses do not expose Parquet/Delta files for external copying. You can, use pipelines/notebooks to extract and copy key warehouse tables into a Lakehouse or blob in Parquet or Delta format. You can export via COPY or Data Factory.

Please refer the below documents,

Create delta tables - Training | Microsoft Learn

Quickstart: Azure Blob Storage client library for Python - Azure Storage | Microsoft Learn

Copy activity - Azure Data Factory & Azure Synapse | Microsoft Learn

Azure to Azure disaster recovery architecture in Azure Site Recovery - Azure Site Recovery | Microso...

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!

Regards,

B Manikanteswara Reddy

KimTutein · ‎05-27-2025

HI @v-bmanikante

Thank you for your reply. I got a step closer myself. When copying the tables to blob there is folder called "_metadate" included as well. If we exclude this folder we are able to direclty copy the files into the lakehouse and spark engine accept the tables again. I am not sure what the "_metadata" folder is actually containing but as far as I can read some meta data on the spark tables which is usefull for spark - however it seems you do not need them to query using spark. It seems to me the only way to get correct metadata is then read the table using spark and write it using spark (df.write.format("delta").saveAssTable("schema.table"). However - one take on this could then be a fast recover of all tables (I copiend 1 TB worth of tables in 10 min) and then start adding the metadata by "simply" reading and writing with spark engine. This would ensure fast update in a disaster scenario and then work on adding the meta-data will take some more time.

v-bmanikante · ‎05-27-2025

Hi @KimTutein ,

Thanks for the update, that's a really useful discovery. You're right about the _metadata folder. It's used by some systems to store extra info, but it's not needed for Spark to read Delta tables as long as the _delta_log and data files are there. So excluding it makes sense, and it's great to hear that copying the tables without it worked for you. Your idea of doing a quick restore of the tables and then slowly rebuilding the metadata using Spark (by reading and writing the tables) is a smart way to handle a disaster recovery. It gives you fast access to the data while allowing you to fix the table metadata over time. We think your approach is practical and aligns well with how Delta Lake is designed to work. Thanks for sharing , we’ll consider including this in our internal recovery strategy as well.

It looks like your problem has been solved, please mark the helpful reply and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster .

Thank you very much for your kind cooperation!

Please don't forget to give a "Kudos " – I’d truly appreciate it!

Regards,

B Manikanteswara Reddy

Backup data in Lakehouse and warehouse

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Backup data in Lakehouse and warehouse

Helpful resources

Fabric Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026