How to organise Fabric workspaces?

Madhusudan_P · ‎09-10-2024

Hi Community,

I'm seeking suggestions on how to best organize workspaces and lakehouses in Microsoft Fabric for a Medallion Architecture data engineering workflow. I have multiple data sources, including SharePoint Lists, SQL Server, Parquet, AWS RDS (Oracle), and SAP.

Here are the three approaches I'm considering:

Approach 1:
- One workspace per Medallion layer (Raw, Silver, Gold), with separate lakehouses for each data source.
- For DEV, QA, and PROD environments, this results in 9 workspaces with multiple lakehouses (e.g., SAP, RDS, SharePoint).

Approach 2:
- One workspace per data source (e.g., SAP, RDS), with separate lakehouses for each medallion layer within each workspace.
- For DEV, QA, and PROD environments, this results in a total of 3 * (number of data sources) workspaces.

Approach 3:
- One workspace per Medallion layer, with a single lakehouse for all data sources.
- Data is organized using schemas within the lakehouse for different sources.
- Similar to Approach 1, this results in 9 workspaces but with only one lakehouse per workspace.

Which approach would be the most effective, or is there a better structure you would recommend?

Thanks!

Madhusudan

v-jingzhan-msft · ‎09-11-2024

Hi @Madhusudan_P

Among the three approaches, Approach 1 seems to offer the best balance between clarity, manageability, and scalability. It allows for clear separation of processing stages and easier management of permissions and access controls. However, if administrative overhead is a significant concern, Approach 3 could be a viable alternative.

In addition, I have some other suggestions.

First of all, one workspace per Medallion layer is recommended. But within each layer, the structure of data can be different:

The bronze layer is the landing zone for all data, whether it's structured, semi-structured, or unstructured. The data is stored in its original format, and no changes are made to it. In this layer, you can have separate lakehouses for each data source. This makes it easier to manage the data landing process.
The silver layer is where you'll validate and refine your data. Typical activities in the silver layer include combining and merging data and enforcing data validation rules like removing nulls and deduplicating. The silver layer can be thought of as a central repository across an organization or team, where data is stored in a consistent format and can be accessed by multiple teams. You can still have separate lakehouses for each data source. You can also consolidates data from diverse data sources into a unified Enterprise view of crucial business entities, concepts, and transactions — such as master customers, stores, non-duplicated transactions, and cross-reference tables.
In the gold layer, data undergoes further refinement to align with specific business and analytics needs. This could involve aggregating data to a particular granularity, such as daily or hourly, or enriching it with external information. You can separate lakehouses according to analysing purposes, such as sales, inventory, etc. You can also prepare data warehouses to model dimension tables and fact tables. Provide data warehouses or semantic models for downstream use. In this layer, we don't consider "data source" concept.

I've borrowed some descriptions from these documentations and blogs below, hope you will find them helpful:

Exploring the Medallion Architecture in Microsoft Fabric | by Mariusz Kujawski | Medium

Describe medallion architecture - Training | Microsoft Learn

What is the medallion lakehouse architecture? - Azure Databricks | Microsoft Learn

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

View solution in original post

Madhusudan_P · ‎09-11-2024

Thank you for your detailed answer @v-jingzhan-msft . I thought I would have to organise data in all 3 layers in similar fashion i.e. by data sources but it seems not and I can adopt a different approach.

Kind Regards,