Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.
Is there an automated (scheduled) way to duplicate a lakehouse + all its contents?
Solved! Go to Solution.
Thank you all for your input. After getting more information from my team, this is what we are trying to accomplish:
1. Create a new empty lakehouse
2. Move this new lakehouse into a specified folder in the workspace
3. Copy tables from a source lakehouse in the same workspace
4. Re-create shortcuts
- Step 1 can be done with mssparkutils.lakehouse.create
- I was told to utilize Items - Update Item - REST API (Core) | Microsoft Learn to accomplish step 2. The closest thing I can find is the folderId property but I am not sure that property can even be programmatically altered.
There's multiple ways you can do this.
For example, in Azure DevOps or GitHub you can create a workspace with the name of your choice with the fabric Command Line interface and then in a separate task run fabric-cicd to deploy the lakehouse and any accompany notebooks to the new workspace.
From there, you can run the notebooks to pop[ulate your Lakehouse with data. I have a post on how you can run the notebook below:
https://www.kevinrchant.com/2025/01/31/authenticate-as-a-service-principal-to-run-a-microsoft-fabric...
Hi @ebjim,
You’re correct — currently, the available options are Pipelines (Copy Data) and OneLake Shortcuts.
Here are some add on points to keep in mind:
Metadata & Permissions – Only the data itself is copied. Items such as permissions, shortcuts, or relationships at the Lakehouse level are not automatically transferred, so these may need to be set up manually in the destination Lakehouse.
Incremental vs Full Loads – For large Lakehouse's, it’s best to configure your pipeline for incremental copies (for example, using modified date columns) rather than full refreshes to save time and storage.
Overwrite Strategy – If you want a complete duplicate each time, set the Copy Data activity to overwrite mode to keep the target Lakehouse up to date and avoid old data.
Notebooks for Flexibility – For advanced cases, such as applying transformations during duplication, you can use Spark notebooks or SQL scripts with Data Factory pipelines.
Storage Efficiency – If duplication is only needed for read access, using Shortcuts instead of copying data helps save storage and still allows access to the same data.
These are the official resources you may find useful:
https://learn.microsoft.com/en-gb/fabric/data-factory/connector-lakehouse-copy-activity
https://learn.microsoft.com/en-gb/fabric/onelake/onelake-shortcuts
Thank you and Continue using microsoft Fabric community forum.
Hi @ebjim,
Just wanted to check regarding your question. We haven’t heard back and want to ensure you're not stuck. If you need anything else or have updates to share, we’re here to help!
Thank you.
Thank you all for your input. After getting more information from my team, this is what we are trying to accomplish:
1. Create a new empty lakehouse
2. Move this new lakehouse into a specified folder in the workspace
3. Copy tables from a source lakehouse in the same workspace
4. Re-create shortcuts
- Step 1 can be done with mssparkutils.lakehouse.create
- I was told to utilize Items - Update Item - REST API (Core) | Microsoft Learn to accomplish step 2. The closest thing I can find is the folderId property but I am not sure that property can even be programmatically altered.
Hi @ebjim,
You are correct that mssparkutils.lakehouse.create() is the appropriate method for programmatically creating a new Lakehouse. However, moving the Lakehouse to a specific folder is currently limited, as the folderId property in the Items – Update Item REST API is read-only and cannot be updated programmatically.
At this time, items can only be moved into folders using the Fabric UI. For copying tables from the source Lakehouse, you may use Data Pipelines with the Copy Data activity or Notebooks. Pipelines can be parameterized to iterate through tables and copy them from source to target, while Notebooks can utilize spark.catalog.listTables() to list tables and write them to the target Lakehouse.
Thank you.
Hi @ebjim,
Just a quick check-in! Has your issue been resolved with the information we shared? We’d be delighted to help further if needed.
Thank you.
Hi @ebjim,
Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
If you still have any questions or need more support, please feel free to let us know.
We are more than happy to continue to help you.
Hi @ebjim ,
With the current supported features,
using Data pipeline or notebook can be good option. Although Fabric Deployment Pipeline and using Git can help to duplicate the lakehouse in another workspace but it doesnt duplicate the content (data and tables schema).
So for notebook and datapipeline we can move forward with similar approach.
For datapipeline,
1. Rest api to fetch list of tables. ref : https://learn.microsoft.com/en-us/rest/api/fabric/lakehouse/tables/list-tables?tabs=HTTP
2. Copy tables from one lakehouse to another using Copy activity and paramatrizing the source table name.
3. Copy files recursively from source lakehouse to target lakehouse using Copy activity.
For notebook,
1. Attach source lakehouse to the notebook.
2. Using spark catelog list the tables from the source lakehouse and write back to target lakehouse using abfss path.
Code snippet to list the tables :
spark.catalog.listTables()
3. Recursively write files from source lakehouse to target lakehouse using pyspark.
Hi @ebjim ,
At the moment, Microsoft Fabric does not provide a direct “duplicate lakehouse” button or a built-in scheduled duplication feature for lakehouses + all their contents. But you can achieve the same result with a few approaches:
Pipelines (Copy Data activity)
Use a Fabric Data Pipeline with Copy Data to copy all tables/files from the source Lakehouse to a target Lakehouse.
This can be scheduled to run periodically (daily/hourly).
Notebooks (PySpark / Pandas)
In a Fabric Notebook, read all delta tables and files from the source Lakehouse and write them into the target Lakehouse.
You can then schedule the notebook via a pipeline or Fabric scheduling.
Shortcuts + New Lakehouse
If you want more of a “virtual duplication,” you can create a new Lakehouse and use shortcuts to point to the same files/tables in OneLake.
This avoids data duplication, but still gives you a new Lakehouse entry.
Backup/Restore approach (Preview)
Fabric is evolving, and Microsoft has hinted at backup & restore features for Lakehouses, but they aren’t fully available yet. Until then, pipelines or notebooks are the way to go.
Hope this helps. Please mark my response as a solution if useful.
Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!
Check out the September 2025 Fabric update to learn about new features.
User | Count |
---|---|
16 | |
4 | |
4 | |
3 | |
2 |
User | Count |
---|---|
36 | |
9 | |
5 | |
3 | |
3 |