Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Ashwath_Bala_S
Helper II
Helper II

Implementing Medallion Architecture in Fabric

Hi,

 

I am implementing Medallion Architecture in "Microsoft Fabric", I am having few queries on the same.

1. One of the option, is bringing data into "Data Warehouse" using "Data Factory". 

2. I can use Lakehouse (for large volume of data handling) or Warehouse (For analytical Work handling)

3. For Bronze, Silver, Gold Layers, few options are

(i) Using a same workspace (With different notebooks and saving tables as delta tables, and then running notebook at intervals to do incremental loads)

(ii) Having a different workspace for bronze layer and separate for silver and gold layer.

(iii) For Bronze using "Data Factory", Silver using "Data Warehouse", Gold layer using "Data Engineering".

 

Just gone through different resources, having few doubts on this.

My data source is SQL Server.

My approach is,

1. Creating a workspace (Need to check if I need to have multiple for multiple layers)

2. Use "Data Factory Persona" (using Dataflow Gen2) to load data into either Warehouse/Lakehouse.

 

I would like to get a query get resolved on:

1. How to figure out number of Workspace needed.

2. No. of lakehouses needed.

3. Requirement of Warehouse (Maybe for analytical loads), etc..

 

Any assistance on this would be highly helpful, since Fabric is fresh perspective. (Any references would also be helpful)

Thanks in Advance

1 ACCEPTED SOLUTION
v-gchenna-msft
Community Support
Community Support

Hi @Ashwath_Bala_S ,

Thanks for using Fabric Community. Following an approach is completely depends on your use case. I would like to provide a suggetion based on my understanding.

Number of Workspaces: 

  • Generally, one workspace is sufficient for managing the Bronze, Silver, and Gold layers within the Medallion Architecture. This simplifies data lineage tracking, access control, and administration.
  • Consider separate workspaces if:
    • You have complex governance requirements with strict separation of duties.
    • You're working with highly sensitive data that demands isolation for specific layers.

 

Number of Lakehouses:

  • One lakehouse is typically adequate for the Medallion Architecture in Fabric. It provides a unified platform for storing raw, curated, and consumption-ready data.
  • Multiple lakehouses might be necessary in rare scenarios:
    • Handling extremely large datasets (petabytes or more) that necessitate specialized storage solutions.
    • Implementing geographically distributed lakehouses for data residency compliance.


Warehouse vs. Lakehouse:

  • Microsoft Fabric's lakehouse capabilities (powered by Azure Data Lake Storage) are well-suited for handling large volumes of data, including raw, semi-structured, and structured formats.
  • A traditional data warehouse might be beneficial as a serving layer if:
    • You have stringent performance requirements for complex analytical queries.
    • Your data model is highly optimized for specific reporting needs.


Recommendations:

  1. Start with a single workspace: This simplifies management and promotes data lineage visibility. If governance needs evolve later, you can introduce separate workspaces.
  2. Utilize a single lakehouse: Leverage Fabric's lakehouse capabilities for raw, curated, and consumption-ready data. Consider additional lakehouses only for exceptional circumstances.
  3. Evaluate the need for a data warehouse: If your analytical workloads demand high performance or require a pre-defined data model, explore using a data warehouse as the serving layer on top of the lakehouse.

 

Additional Docs to refer -
Medallion architecture in Microsoft Fabric | by Valentin Loghin | Feb, 2024 | Medium
MS Fabric - The Medallion Architecture

Hope this is helpful. Please let me know incase of further queries.

View solution in original post

7 REPLIES 7
Ashwath_Bala_S
Helper II
Helper II

Hi @v-gchenna-msft ,

Yeah it helped me.

Thank You!

v-gchenna-msft
Community Support
Community Support

Hi @Ashwath_Bala_S ,

Thanks for using Fabric Community. Following an approach is completely depends on your use case. I would like to provide a suggetion based on my understanding.

Number of Workspaces: 

  • Generally, one workspace is sufficient for managing the Bronze, Silver, and Gold layers within the Medallion Architecture. This simplifies data lineage tracking, access control, and administration.
  • Consider separate workspaces if:
    • You have complex governance requirements with strict separation of duties.
    • You're working with highly sensitive data that demands isolation for specific layers.

 

Number of Lakehouses:

  • One lakehouse is typically adequate for the Medallion Architecture in Fabric. It provides a unified platform for storing raw, curated, and consumption-ready data.
  • Multiple lakehouses might be necessary in rare scenarios:
    • Handling extremely large datasets (petabytes or more) that necessitate specialized storage solutions.
    • Implementing geographically distributed lakehouses for data residency compliance.


Warehouse vs. Lakehouse:

  • Microsoft Fabric's lakehouse capabilities (powered by Azure Data Lake Storage) are well-suited for handling large volumes of data, including raw, semi-structured, and structured formats.
  • A traditional data warehouse might be beneficial as a serving layer if:
    • You have stringent performance requirements for complex analytical queries.
    • Your data model is highly optimized for specific reporting needs.


Recommendations:

  1. Start with a single workspace: This simplifies management and promotes data lineage visibility. If governance needs evolve later, you can introduce separate workspaces.
  2. Utilize a single lakehouse: Leverage Fabric's lakehouse capabilities for raw, curated, and consumption-ready data. Consider additional lakehouses only for exceptional circumstances.
  3. Evaluate the need for a data warehouse: If your analytical workloads demand high performance or require a pre-defined data model, explore using a data warehouse as the serving layer on top of the lakehouse.

 

Additional Docs to refer -
Medallion architecture in Microsoft Fabric | by Valentin Loghin | Feb, 2024 | Medium
MS Fabric - The Medallion Architecture

Hope this is helpful. Please let me know incase of further queries.

Hello @Ashwath_Bala_S ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
Otherwise, will respond back with the more details and we will try to help .

Hi @v-gchenna-msft ,

 

My data is in SQL Server

One of the approach I have is,

1. Loading in Lakehouse,

2. Creating notebooks to have diferent layers and automate the same using a Pipeline. (In this method the inital tables too are imported into OneLake)

Is there any other way of not loading the tables into OneLake and only having the different layer tables.

Any guidance on this will be highly helpful.

Thanks in Advance!

Hi @Ashwath_Bala_S ,

Inorder to avoid initial load, you can actually use these 2 ways:
1. Using Notebooks - You can directly connect to SQL Server using spark and read the table values, perform transformations and then create a table in lakehouse.

Link for reference -
Apache Spark connector for SQL Server - Spark connector for SQL Server | Microsoft Learn

2. Using Data Factory Pipeline - You can connect to SQL Server using SQL Server Connection in Pipeline and then perform basic transformations using Pipeline Activities and Data Flow Gen2.
After transformations you can load data directly into lakehouse tables.

Above methods will eliminates the initial load into One Lake and avoids data duplication.

Hope this gives you some insights. Please let me know incase of further queries.

Hi @v-gchenna-msft ,

 

Yeah it helps!

Thank You!

Glad to know that you got some insights. Please continue using Fabric Community for your further queries.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Kudoed Authors