Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi community
I have an internal audit asking us “what will you do if hackers take over our workspace” and we cannot get to it.
We thought that for our lakehouse we might backup our tables and files in an blob storage which could be isolated from our main subscription. We manged to write a small notebook to simply copy the delta parquet tables to a blob. We are also able to restore these files to a new lakehouse. We are able to query the tables using sql endpoint – however spark will not accept the tables via. For instance spark.sql() commands. It seems the problem is with registration og the tables in spark metastore which will not accept the delta spark files has just been copied into the tables section.
Best Regards
Kim Tutein
Solved! Go to Solution.
Hi @KimTutein ,
Thanks for the update, that's a really useful discovery. You're right about the _metadata folder. It's used by some systems to store extra info, but it's not needed for Spark to read Delta tables as long as the _delta_log and data files are there. So excluding it makes sense, and it's great to hear that copying the tables without it worked for you. Your idea of doing a quick restore of the tables and then slowly rebuilding the metadata using Spark (by reading and writing the tables) is a smart way to handle a disaster recovery. It gives you fast access to the data while allowing you to fix the table metadata over time. We think your approach is practical and aligns well with how Delta Lake is designed to work. Thanks for sharing , we’ll consider including this in our internal recovery strategy as well.
It looks like your problem has been solved, please mark the helpful reply and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster .
Thank you very much for your kind cooperation!
|
Please don't forget to give a "Kudos |
Regards,
B Manikanteswara Reddy
Hi @KimTutein ,
Thank you for reaching out to Microsoft Fabric Community.
It looks like the issue you're facing is very similar to one that another user encountered earlier. That issue has been resolved, and the solution might be helpful in your case as well.
Solved: How to take backups for Lakehouse ,warehouse and p... - Microsoft Fabric Community
Lakehouse/Warehouse backups - Microsoft Fabric Community
You can copy Delta Lake tables (including _delta_log) to an isolated Azure Blob Storage (ideally in a different subscription or tenant).Maintain a manifest (JSON/table) of the table names and paths for recovery. Delta format (Parquet + log) is portable and supports full recovery. Blob storage can be made immutable (WORM policy) to prevent tampering.
Use Spark SQL to re-register tables:
CREATE TABLE my_table
USING DELTA
LOCATION 'abfss://<container>@<account>.dfs.core.windows.net/path/to/table'
SQL Warehouses do not expose Parquet/Delta files for external copying. You can, use pipelines/notebooks to extract and copy key warehouse tables into a Lakehouse or blob in Parquet or Delta format. You can export via COPY or Data Factory.
Please refer the below documents,
Create delta tables - Training | Microsoft Learn
Quickstart: Azure Blob Storage client library for Python - Azure Storage | Microsoft Learn
Copy activity - Azure Data Factory & Azure Synapse | Microsoft Learn
If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!
Regards,
B Manikanteswara Reddy
Thank you for your reply. I got a step closer myself. When copying the tables to blob there is folder called "_metadate" included as well. If we exclude this folder we are able to direclty copy the files into the lakehouse and spark engine accept the tables again. I am not sure what the "_metadata" folder is actually containing but as far as I can read some meta data on the spark tables which is usefull for spark - however it seems you do not need them to query using spark. It seems to me the only way to get correct metadata is then read the table using spark and write it using spark (df.write.format("delta").saveAssTable("schema.table"). However - one take on this could then be a fast recover of all tables (I copiend 1 TB worth of tables in 10 min) and then start adding the metadata by "simply" reading and writing with spark engine. This would ensure fast update in a disaster scenario and then work on adding the meta-data will take some more time.
Hi @KimTutein ,
Thanks for the update, that's a really useful discovery. You're right about the _metadata folder. It's used by some systems to store extra info, but it's not needed for Spark to read Delta tables as long as the _delta_log and data files are there. So excluding it makes sense, and it's great to hear that copying the tables without it worked for you. Your idea of doing a quick restore of the tables and then slowly rebuilding the metadata using Spark (by reading and writing the tables) is a smart way to handle a disaster recovery. It gives you fast access to the data while allowing you to fix the table metadata over time. We think your approach is practical and aligns well with how Delta Lake is designed to work. Thanks for sharing , we’ll consider including this in our internal recovery strategy as well.
It looks like your problem has been solved, please mark the helpful reply and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster .
Thank you very much for your kind cooperation!
|
Please don't forget to give a "Kudos |
Regards,
B Manikanteswara Reddy
Check out the November 2025 Fabric update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!