Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
bw_chec
Frequent Visitor

Best Practice for Lakehouse Strategy

I want to discuss the best strategy for lakehouses.

 

I have a Dev and a Prod workspace for data engineers (in a central data team) to work in. We are producing data products (semantic models) for Power BI reports to be built by the business. 

 

What is the best strategy with regards to bronze/silver/gold/dev/prod lakehouses /separate lakehouses per department etc?

I want to minimise confusion and having data all over the place whilst also ensuring good security, compliance and governance. I want it to also be easy to work with deployment pipelines and being able to pass data through bronze->silver->gold in both dev and prod.

 

At the moment I am planning on having 2 lakehouses in each workspace, one with bronze and silver data in (so I cna do bronze->silver ETL in notebooks with this lakehouse as default. The 2nd would have gold data in and be the default lakehouse in silver -> gold notebooks. 

Having them as default makes deployment parameters work well. 

 

Then I don't know the best strategy for serving semantic models (with proper table-level access) to different areas of the business from the gold lakehouse.

 

All help appreciated.

1 ACCEPTED SOLUTION
FabianSchut
Impactful Individual
Impactful Individual

Hi @bw_chec,

My preferred approach in these cases is to have one DEV/TEST/PROD deployment pipeline with one workspace per stage. In one workspace, you have all the required lakehouses (in your case bronze, silver and gold) and move them to the next stage with the deployment pipelines. All workspaces within the deployment pipeline contain all data. After you have your data ready in the production workspace, you create new workspaces per department. Within the department workspace, you create one lakehouse and make shortcuts to the production gold lakehouse tables (as you requested, silver and bronze are also possible) that department should have access to. The lakehouse you create for that department has a default semantic model attached that contains all the requested data for that department.

In that case, the data ETL flow is clear and managed in one central place. The different department workspaces only contain shortcuts, without modification of the data. This ensures integrity of the data with the ability to controll access to specific tables and files.

You can find the documentation of shortcuts here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#lakehouse.

 

Best regards,

Fabian

View solution in original post

6 REPLIES 6
FabianSchut
Impactful Individual
Impactful Individual

Hi @bw_chec,

My preferred approach in these cases is to have one DEV/TEST/PROD deployment pipeline with one workspace per stage. In one workspace, you have all the required lakehouses (in your case bronze, silver and gold) and move them to the next stage with the deployment pipelines. All workspaces within the deployment pipeline contain all data. After you have your data ready in the production workspace, you create new workspaces per department. Within the department workspace, you create one lakehouse and make shortcuts to the production gold lakehouse tables (as you requested, silver and bronze are also possible) that department should have access to. The lakehouse you create for that department has a default semantic model attached that contains all the requested data for that department.

In that case, the data ETL flow is clear and managed in one central place. The different department workspaces only contain shortcuts, without modification of the data. This ensures integrity of the data with the ability to controll access to specific tables and files.

You can find the documentation of shortcuts here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#lakehouse.

 

Best regards,

Fabian

Fabian you are a genius. This is by far the best suggestion for workspace / lakehouse structuring I have heard. So in your design, each department would have their own lakehouse which only contains a collection of shortcuts, therefore no further copies of data?

Hi @nealamin,

 

Yes that is correct. One lakheouse per department, no copies, no adjustments, just a selection of required tables.

v-huijiey-msft
Community Support
Community Support

Hi @bw_chec ,

 

Regarding Bronze/Silver/Gold/Development/Production Lake Warehouse in one place/separate Lake Warehouse in one place for each division, etc., I have the following explanation:

 

  • Bronze Layer: Raw, unprocessed data ingested from various sources.
  • Silver Layer: Cleansed and enriched data, ready for analytics.
  • Gold Layer: Aggregated and highly curated data, optimized for business reporting and analytics.

 

  • Dev Workspace: Use for development and testing.
  • Prod Workspace: Use for production data and final outputs.

 

Deploy ETL processes and data models from development to production using deployment pipelines.

 

For more information on how to use deployment pipelines see:

Get started using deployment pipelines, the Fabric Application lifecycle management (ALM) tool - Mic...

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

Hi, thanks for your reply @v-huijiey-msft , but I was more asking about how many lakehouses should I have in my data estate. For the purposes of both managing deployment pipelines and keeping data available to serve to different departments 

Hi @bw_chec ,

 

You are on the right track. Put two lakehouses in each workspace, one for bronze and silver data and one for gold data. When deploying the pipeline, assign the different workspaces to the appropriate stages.

 

Manage data access for different departments through permission control instead of creating separate lakehouse for each department.

 

If there are special needs, create separate lakehouses for certain departments, but try to create as few as possible for better management.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.