Solved: Medallion Architecture - best practices to combine...

Adriansm · ‎05-19-2025

Hi all,

I'm still fairly new to Microsoft Fabric and have been researching best practices for my use case, but haven’t found anything specific so far. I’d really appreciate hearing the community’s thoughts on this.

We currently have a highly unstructured and decentralized data estate spread across multiple systems. We're planning to use Fabric to implement a centralized, structured solution based on the medallion architecture, starting with our sales data. We've done most of the work with this data, but are stuck discussing what's the optimal way to join the different sources.

Context:
We have two systems that process our sales data, both involve large datasets with different schemas.

Bronze layer: We're ingesting data from these two systems into our bronze lakehouse using two separate pipelines.
Silver layer: We use Dataflow Gen2 (DFG2) on each lakehouse endpoint to transform the data into a proposed standard schema. Since the source schemas differ, we have a separate DFG2 activity for each source, each publishing to its own schema within the silver lakehouse.
Gold layer: We plan to build one or more semantic models from the silver layer to support various Power BI dashboards.

We’re now trying to determine the most efficient way to combine or union the two datasets in the silver layer, but merely duplicating all the data into another lakehouse sounds very ineficcient. Both datasets now follow the same curated schema in the silver layer, but we’re unsure of the best approach to merge them.

Any advice or best practices would be greatly appreciated.

Thanks in advance for your help

andrewsommer · ‎05-19-2025

I would start by asking if you have to merge them? Instead of physically merging the two silver datasets into a new table in gold, define views in the gold lakehouse that logically union the two silver datasets > semantic model > power bi.

As a side note, I avoid Dataflow Gen 2s as much as possible; they eat up CUs. Notebook everything you can.

Please mark this post as a solution if it helps you. Appreciate Kudos.

View solution in original post

burakkaragoz · ‎05-21-2025

Hi @Adriansm

Just checking in – did the steps I shared help resolve the issue?

✅ If it’s working now, feel free to mark the response as the Accepted Solution. This helps others who face the same issue find the fix faster.
✨ And of course, a little Kudos would be much appreciated!

If you're still running into trouble, let me know what you've tried so far and I’ll help you dig deeper. We’ll get it sorted!

burakkaragoz · ‎05-19-2025

Hi @Adriansm ,

we had a similar setup – multiple sources like SQL, SharePoint, SAP, etc. here’s what worked best for us when building a medallion architecture in Fabric:

1. use one workspace per medallion layer

dev/test/prod separation is easier
access control is cleaner
pipelines and deployments stay simple

2. inside each workspace, use one lakehouse per data source

e.g. bronze_sap, bronze_sharepoint, bronze_sql
this keeps raw data isolated and easier to trace

3. combine sources in silver layer

create unified views or tables (e.g. customer, product)
use notebooks or pipelines to join/clean/standardize
this is where cross-source logic lives

4. gold layer = business-ready

only include curated, validated data
used by semantic models and reports
keep it clean and minimal

bonus tip:
use schemas inside lakehouses to separate domains (e.g. sales, finance) if you want fewer lakehouses.

If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful.

Adriansm · ‎05-20-2025

Thank you, but I have a couple of questions about combining data. You're using "notebooks or pipelines", which one of the two and why? Do you join your data onto a new lakehouse in silver? and is that efficient?

Thank you for your reponses

burakkaragoz · ‎05-20-2025

You're absolutely right — the best practice in a Medallion Architecture is to:

Ingest each data source separately into the Bronze layer, keeping raw data as-is.
Then in the Silver layer, you can clean, transform, and combine them into a unified model.

This keeps your pipeline modular and makes debugging or reprocessing much easier. Also, if one source changes or fails, it won’t break the whole flow.

In the Gold layer, you can then build business-specific views or aggregates based on that unified Silver model.

Let me know if you want a sample structure or naming convention — happy to share!

Adriansm · ‎05-23-2025

Hi, I appreciate the AI-generated reply but unfortunately I don't think this answers the question. As I stated in my original message, we already have a bronze and silver layer set up, my question was more specifically aimed at best practices for merging/combining this data.

I'll continue exploring further and will update the post with what I find.

Thank you

andrewsommer · ‎05-19-2025

I would start by asking if you have to merge them? Instead of physically merging the two silver datasets into a new table in gold, define views in the gold lakehouse that logically union the two silver datasets > semantic model > power bi.

As a side note, I avoid Dataflow Gen 2s as much as possible; they eat up CUs. Notebook everything you can.

Please mark this post as a solution if it helps you. Appreciate Kudos.

Adriansm · ‎05-20-2025

That’s a very good point. We don’t necessarily need to physically merge the datasets, but there are use cases where we need to feed or compare sales data from both sources, so having them accessible in one place is important.

If I understand correctly, you're suggesting we create a SQL view in the gold layer that uses UNION ALL

to combine the two silver tables into a single logical view. Is that right?

I suppose it all comes down to performance. Our datasets are fairly large, but from what I’ve been reading, views with UNION ALL can be quite efficient, so I’ll definitely give that a try

Thanks also for the advice regarding DFG2. A few others have flagged that as well, so we’re planning to rebuild most of it using notebooks before going live.

burakkaragoz · ‎05-20-2025

Hey, that’s exactly right , you got it!

Creating a logical view in the Gold layer using UNION ALL is a clean and scalable way to bring those Silver datasets together without physically merging them. It keeps things flexible and easier to maintain, especially when working with large volumes.

And yep, performance with views is usually solid — just keep an eye on indexing and partitioning if needed.

Glad to hear you're moving away from DFG2s too — notebooks give you way more control and efficiency.

Medallion Architecture - best practices to combine data sources

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

Medallion Architecture - best practices to combine data sources

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025