Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
What would the best-practice for establishing the medallion architecture inside of Fabric?
My initial thoughts would be multiple workspaces inside of OneLake, but I am unsure if this is recommended or if others have thoughts.
Solved! Go to Solution.
Hello @NicholasJackson,
That is an excellent question! And I'm not sure there is a definitive answer or recommendation to that, yet. Before answering to your question, I would like to highlight few Fabric features, so we all have the relevant context:
Having said that, there are a lot of possibilities depending on your requirements and the size/footprint of your organization. For now, I would apply the following criterias to decide how to structure your data estate. This is my opinion, and I would love to see other answers to your question 🙂 :
With these criterias, we can imagine that a "small" organization can have one lakehouse (in one workspace) to store both the bronze and silver layers, and one to many workspaces for the gold layer and their corresponding Power BI Reports. For an organization with more scale, the layers below gold may be splitted to several lakehouses/workspaces.
The gold data would likely be "always" stored into business-specific workspaces (modulo some "core" aggregates that you might want to share). With such organization, you might end up with a lot of workspaces. Don't forget about the new Domains feature (preview). It allows to enhance the discoverability of data for a specific business area or field within the organization (learn more in the docs).
Hope this first answer gives you more clarity!
Hi @NicholasJackson ,
I am in total agreement with what @cmaneu called out above . Being a big fan of the Medallion Architecture , I think we can start by creating subfolders inside the LakeHouse & may be you can explore that option also .
Ideally in the past I have used different containers in a storage acccount to set the permission correctly .
Since we have the Azure data factory & Synapse as part Fabric we can use them to move data across folders .
Let me know if you have any other questions/thoughts around this .
Thanks
Himanshu
Lots of questions still on how to put theory to practice regarding this medaillon structure.
- Would you use the medaillon layer in the name (for example bronze_sales, silver_sales, gold_sales)?
- Would you load bronze tables as tables at all, or would you just leave them as files?
- How do you know which source system the data comes from? Do you incorporate that into the table/file name?
That's a great idea. Is it possible to manage access to those files? So engineer A has access to bronze+silver and engineer B has access to bronze+gold for example. Or would I need to create separate lakehouses to achieve that?
Makes sense. That's indeed what I would have 1st had in mind. TBH I never defined ADLS GEN2 + default 'root' Container (FileSys) through AZ Synapse Studio, but on previously created separate Azure Data Storage Account. Then I could easily create hierarchical directories for consistency which would be reflected & accessed under Data Hub> Files tab. Bronze was mainly 'raw' Parquet data while Silver was mostly converted Delta format (log, partitions). Microsoft Fabric - Onelake is a new ball game...Well sort of...Prevailing question is probably examining what basically remains (and may still fit) vs was has changed and may no longer be applicable in Fabric context.🫡
Hello @NicholasJackson,
That is an excellent question! And I'm not sure there is a definitive answer or recommendation to that, yet. Before answering to your question, I would like to highlight few Fabric features, so we all have the relevant context:
Having said that, there are a lot of possibilities depending on your requirements and the size/footprint of your organization. For now, I would apply the following criterias to decide how to structure your data estate. This is my opinion, and I would love to see other answers to your question 🙂 :
With these criterias, we can imagine that a "small" organization can have one lakehouse (in one workspace) to store both the bronze and silver layers, and one to many workspaces for the gold layer and their corresponding Power BI Reports. For an organization with more scale, the layers below gold may be splitted to several lakehouses/workspaces.
The gold data would likely be "always" stored into business-specific workspaces (modulo some "core" aggregates that you might want to share). With such organization, you might end up with a lot of workspaces. Don't forget about the new Domains feature (preview). It allows to enhance the discoverability of data for a specific business area or field within the organization (learn more in the docs).
Hope this first answer gives you more clarity!
This is an incredible response, thank you so much!
I work mostly in the SMB area, so keeping it all in one lakehouse sounds like a good approach, at least for now.
Thanks again!
Check out the September 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
5 | |
3 | |
2 | |
2 | |
1 |