Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreShape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.
Hi all,
My organisation is on the verge of implementing Fabric, we think it's a great total solution for BI & Analytics. My task is to set up everything, from ADF pipelines to BI reports. Until now we have only used Power BI for our reporting needs. Everything else within Fabric is completely new to us.
Here's the situation, like most businesses we are dealing with operational, management, financial and HR data. Currently I have those reports set up in different workspaces, with different people having access to them.
Currently we are querying directly on our operational databases. Of course we want to change that and set up our data in the Fabric Cloud.
When organizing all of this, what are some best practices regarding data governance in this situation?
1) Should I just set up 1 lakehouse with all the data for the whole tenant, and restrict access to tables/files?
2) Or should I make several data lakehouses (one for each data 'category')?
3) Or should I create several domains with one data lakehouse in each domain?
Any insight would be much appreciated.
We are in the same boat now and trying to figure out how workspaces and lakehouses should be organized? Could you please suggest which approach did you take evenutally and how is it working out?
I posted something similar and got some nice answers from community. Is it something similar which you implemented?
https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-organise-Fabric-workspaces/m-p/414...
Kind Regards,
Madhusudan
@Noeleke1301 : Thanks for the detailed meassage and it does help for community member to undestand more on the ask and they can share their ideas on this .
As you rightly said that moving the data from Operational DB to lakehouse you can use the datafactory or notebook and both should suffice the ask .
I think your idea of having the medallion architecture is the right approach , but I was thinking that if you can have the lakehouse as the department and then have folders with the name like Bronze / Silver / Gold . By doing this you will have uniformity in the data hierarchy for all the departments and also it will enable to set the permission at the lakshouse level for the users.
HTH
Thanks
Himanshu
@HimanshuS-msft
Thanks for your reply once again. I indeed hope that this forum will get more active with ideas being shared on how to improve organization & architecture within Fabric.
So what you're saying is that I should make a DLH per department, for example HR. If one of my operations reports need to have a subset of this HR data, take for example 'hours worked', how would I proceed?
1) Create a shortcut in the operations DLH to the 'hours' file in the HR DLH
2) Or connect the report to the HR DLH, but how would this work with user permissions?
A second question; you mentioned I should create folders with B/S/G names, but I notices that I cannot create folders in the tables section of a DLH, only in the files section.
Does that mean that I should have all my tables start out as a file (and then create a table from then), instead of loading the table directly from say a dataflow?
Thanks so much for your insights.
Hi @Noeleke1301
Thanks for you using Fabric community forum .
I am aware that many of us are excited about Fabric and want to transition to this . But do you have any current challenges which you have in the current setup which you are willing to resolve ?
I will suggest that you should go with #2
2) Or should I make several data lakehouses (one for each data 'category')?
This wil give you they option to set permission at the Lakehouse level one for each department .
Thanks
Himanshu
Dear @HimanshuS-msft
Current challenges
We have been making reports and dashboards for several years now, connecting directly to our operational database(s). Of course, this is not a sustainable solution and we have had problems with this in the past.
We are currently making a plan to stop having to connect directly to our operational databases, typically that involves setting up some kind of storage solution in between.
With the release of Fabric, this is now relatively child's play. Instead of having to research the best combination of services in Azure, we can now have one portal with one capacity that can provide in all our BI & analytics needs.
Also, we are constantly improving our operations and have hired some people in the past on a project-basis to create analytical models for us. But we have never really been able to deploy them in our operations, because we lacked the know-how on how to deploy such models. Again, Fabric can help us with that by introducing notebooks. With notebooks, we can copy the code of the model, drop it in a notebook, and run it periodically on our data.
Regarding organizing + governance
My current view on things is that we will create at least 3 data lakehouses: bronze, silver & gold. But adding to that, I was wondering if I should make separate lakehouses for confidential data (such as HR and Finance) or whether I should manage access to that data within my B/S/G lakehouses.
Also, should I put everything in the same workspace, or should I split these? What about my ingestion dataflows (prod db = oracle) and notebooks? Same workspace as my lakehouses, or different ones?
I'm generally looking for some best practices on how to organize my workspaces and lakehouse/warehouse instances to stimulate fast progress once everything is set up, minimize rework when our data maturity has improved, but most of all keep everything crystal clear.
Thanks for your insights.
Hi @HimanshuS-msft ,
We have similar doubt of your,
We aim to deploy a medallion architecture with the usual 3-layer approach (bronze/silver/golden). But after that point many questions does apper all them related about how to organize the content
1) Lakehouses: 1 lakehouse by layer (Silver/..) or 1 lakehouse by layer/functional area (e.g: HR; finance;...)?
2) Fabric related artifacts (notebooks, pipelines, datadlows, lakehouses)
- 1 workspace by layer
- 1 workspace per layer/area and environment
- previous ones + sub-Folders (the long awaited future seems to arrive on 2024 Q1)
Has anyone a proposal about how to organize the Fabric content of a Medallion Architecture to avoid ending with tons of objects on a workspace?
Thx
User | Count |
---|---|
4 | |
4 | |
2 | |
1 | |
1 |
User | Count |
---|---|
13 | |
6 | |
5 | |
4 | |
4 |