Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
I want to discuss the best strategy for lakehouses.
I have a Dev and a Prod workspace for data engineers (in a central data team) to work in. We are producing data products (semantic models) for Power BI reports to be built by the business.
What is the best strategy with regards to bronze/silver/gold/dev/prod lakehouses /separate lakehouses per department etc?
I want to minimise confusion and having data all over the place whilst also ensuring good security, compliance and governance. I want it to also be easy to work with deployment pipelines and being able to pass data through bronze->silver->gold in both dev and prod.
At the moment I am planning on having 2 lakehouses in each workspace, one with bronze and silver data in (so I cna do bronze->silver ETL in notebooks with this lakehouse as default. The 2nd would have gold data in and be the default lakehouse in silver -> gold notebooks.
Having them as default makes deployment parameters work well.
Then I don't know the best strategy for serving semantic models (with proper table-level access) to different areas of the business from the gold lakehouse.
All help appreciated.
Solved! Go to Solution.
Hi @bw_chec,
My preferred approach in these cases is to have one DEV/TEST/PROD deployment pipeline with one workspace per stage. In one workspace, you have all the required lakehouses (in your case bronze, silver and gold) and move them to the next stage with the deployment pipelines. All workspaces within the deployment pipeline contain all data. After you have your data ready in the production workspace, you create new workspaces per department. Within the department workspace, you create one lakehouse and make shortcuts to the production gold lakehouse tables (as you requested, silver and bronze are also possible) that department should have access to. The lakehouse you create for that department has a default semantic model attached that contains all the requested data for that department.
In that case, the data ETL flow is clear and managed in one central place. The different department workspaces only contain shortcuts, without modification of the data. This ensures integrity of the data with the ability to controll access to specific tables and files.
You can find the documentation of shortcuts here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#lakehouse.
Best regards,
Fabian
Hi @bw_chec,
My preferred approach in these cases is to have one DEV/TEST/PROD deployment pipeline with one workspace per stage. In one workspace, you have all the required lakehouses (in your case bronze, silver and gold) and move them to the next stage with the deployment pipelines. All workspaces within the deployment pipeline contain all data. After you have your data ready in the production workspace, you create new workspaces per department. Within the department workspace, you create one lakehouse and make shortcuts to the production gold lakehouse tables (as you requested, silver and bronze are also possible) that department should have access to. The lakehouse you create for that department has a default semantic model attached that contains all the requested data for that department.
In that case, the data ETL flow is clear and managed in one central place. The different department workspaces only contain shortcuts, without modification of the data. This ensures integrity of the data with the ability to controll access to specific tables and files.
You can find the documentation of shortcuts here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts#lakehouse.
Best regards,
Fabian
Fabian you are a genius. This is by far the best suggestion for workspace / lakehouse structuring I have heard. So in your design, each department would have their own lakehouse which only contains a collection of shortcuts, therefore no further copies of data?
Hi @nealamin,
Yes that is correct. One lakheouse per department, no copies, no adjustments, just a selection of required tables.
Hi @bw_chec ,
Regarding Bronze/Silver/Gold/Development/Production Lake Warehouse in one place/separate Lake Warehouse in one place for each division, etc., I have the following explanation:
Deploy ETL processes and data models from development to production using deployment pipelines.
For more information on how to use deployment pipelines see:
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi, thanks for your reply @v-huijiey-msft , but I was more asking about how many lakehouses should I have in my data estate. For the purposes of both managing deployment pipelines and keeping data available to serve to different departments
Hi @bw_chec ,
You are on the right track. Put two lakehouses in each workspace, one for bronze and silver data and one for gold data. When deploying the pipeline, assign the different workspaces to the appropriate stages.
Manage data access for different departments through permission control instead of creating separate lakehouse for each department.
If there are special needs, create separate lakehouses for certain departments, but try to create as few as possible for better management.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.
User | Count |
---|---|
6 | |
5 | |
4 | |
4 | |
1 |
User | Count |
---|---|
16 | |
12 | |
9 | |
8 | |
6 |