Solved: Automated duplication of lakehouses

ebjim · ‎08-20-2025

Is there an automated (scheduled) way to duplicate a lakehouse + all its contents?

ebjim · ‎08-25-2025

Thank you all for your input. After getting more information from my team, this is what we are trying to accomplish:

1. Create a new empty lakehouse

2. Move this new lakehouse into a specified folder in the workspace

3. Copy tables from a source lakehouse in the same workspace

4. Re-create shortcuts

- Step 1 can be done with mssparkutils.lakehouse.create

- I was told to utilize Items - Update Item - REST API (Core) | Microsoft Learn to accomplish step 2. The closest thing I can find is the folderId property but I am not sure that property can even be programmatically altered.

View solution in original post

KevinChant · ‎08-27-2025

There's multiple ways you can do this.

For example, in Azure DevOps or GitHub you can create a workspace with the name of your choice with the fabric Command Line interface and then in a separate task run fabric-cicd to deploy the lakehouse and any accompany notebooks to the new workspace.

From there, you can run the notebooks to pop[ulate your Lakehouse with data. I have a post on how you can run the notebook below:

https://www.kevinrchant.com/2025/01/31/authenticate-as-a-service-principal-to-run-a-microsoft-fabric...

v-sgandrathi · ‎08-21-2025

Hi @ebjim,

You’re correct — currently, the available options are Pipelines (Copy Data) and OneLake Shortcuts.
Here are some add on points to keep in mind:

Metadata & Permissions – Only the data itself is copied. Items such as permissions, shortcuts, or relationships at the Lakehouse level are not automatically transferred, so these may need to be set up manually in the destination Lakehouse.

Incremental vs Full Loads – For large Lakehouse's, it’s best to configure your pipeline for incremental copies (for example, using modified date columns) rather than full refreshes to save time and storage.

Overwrite Strategy – If you want a complete duplicate each time, set the Copy Data activity to overwrite mode to keep the target Lakehouse up to date and avoid old data.

Notebooks for Flexibility – For advanced cases, such as applying transformations during duplication, you can use Spark notebooks or SQL scripts with Data Factory pipelines.

Storage Efficiency – If duplication is only needed for read access, using Shortcuts instead of copying data helps save storage and still allows access to the same data.

These are the official resources you may find useful:
https://learn.microsoft.com/en-gb/fabric/data-factory/connector-lakehouse-copy-activity
https://learn.microsoft.com/en-gb/fabric/onelake/onelake-shortcuts

Thank you and Continue using microsoft Fabric community forum.

v-sgandrathi · ‎08-24-2025

Hi @ebjim,

Just wanted to check regarding your question. We haven’t heard back and want to ensure you're not stuck. If you need anything else or have updates to share, we’re here to help!

Thank you.

ebjim · ‎08-25-2025

Thank you all for your input. After getting more information from my team, this is what we are trying to accomplish:

1. Create a new empty lakehouse

2. Move this new lakehouse into a specified folder in the workspace

3. Copy tables from a source lakehouse in the same workspace

4. Re-create shortcuts

- Step 1 can be done with mssparkutils.lakehouse.create

- I was told to utilize Items - Update Item - REST API (Core) | Microsoft Learn to accomplish step 2. The closest thing I can find is the folderId property but I am not sure that property can even be programmatically altered.

v-sgandrathi · ‎08-25-2025

Hi @ebjim,

You are correct that mssparkutils.lakehouse.create() is the appropriate method for programmatically creating a new Lakehouse. However, moving the Lakehouse to a specific folder is currently limited, as the folderId property in the Items – Update Item REST API is read-only and cannot be updated programmatically.
At this time, items can only be moved into folders using the Fabric UI. For copying tables from the source Lakehouse, you may use Data Pipelines with the Copy Data activity or Notebooks. Pipelines can be parameterized to iterate through tables and copy them from source to target, while Notebooks can utilize spark.catalog.listTables() to list tables and write them to the target Lakehouse.

Thank you.

v-sgandrathi · ‎08-28-2025

Hi @ebjim,

Just a quick check-in! Has your issue been resolved with the information we shared? We’d be delighted to help further if needed.

Thank you.

v-sgandrathi · ‎09-01-2025

Hi @ebjim,

Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?

If you still have any questions or need more support, please feel free to let us know.
We are more than happy to continue to help you.

chetanhiwale · ‎08-21-2025

Hi @ebjim ,

With the current supported features,
using Data pipeline or notebook can be good option. Although Fabric Deployment Pipeline and using Git can help to duplicate the lakehouse in another workspace but it doesnt duplicate the content (data and tables schema).

So for notebook and datapipeline we can move forward with similar approach.

For datapipeline,
1. Rest api to fetch list of tables. ref : https://learn.microsoft.com/en-us/rest/api/fabric/lakehouse/tables/list-tables?tabs=HTTP

2. Copy tables from one lakehouse to another using Copy activity and paramatrizing the source table name.
3. Copy files recursively from source lakehouse to target lakehouse using Copy activity.

For notebook,
1. Attach source lakehouse to the notebook.
2. Using spark catelog list the tables from the source lakehouse and write back to target lakehouse using abfss path.
Code snippet to list the tables :

spark.catalog.listTables()

3. Recursively write files from source lakehouse to target lakehouse using pyspark.

AmiGarala · ‎08-20-2025

Hi @ebjim ,

At the moment, Microsoft Fabric does not provide a direct “duplicate lakehouse” button or a built-in scheduled duplication feature for lakehouses + all their contents. But you can achieve the same result with a few approaches:

Options to duplicate a Lakehouse automatically

Pipelines (Copy Data activity)
- Use a Fabric Data Pipeline with Copy Data to copy all tables/files from the source Lakehouse to a target Lakehouse.
- This can be scheduled to run periodically (daily/hourly).
Notebooks (PySpark / Pandas)
- In a Fabric Notebook, read all delta tables and files from the source Lakehouse and write them into the target Lakehouse.
- You can then schedule the notebook via a pipeline or Fabric scheduling.
Shortcuts + New Lakehouse
- If you want more of a “virtual duplication,” you can create a new Lakehouse and use shortcuts to point to the same files/tables in OneLake.
- This avoids data duplication, but still gives you a new Lakehouse entry.
Backup/Restore approach (Preview)
- Fabric is evolving, and Microsoft has hinted at backup & restore features for Lakehouses, but they aren’t fully available yet. Until then, pipelines or notebooks are the way to go.

Hope this helps. Please mark my response as a solution if useful.

Automated duplication of lakehouses

Options to duplicate a Lakehouse automatically

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Automated duplication of lakehouses

Options to duplicate a Lakehouse automatically

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026