Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Adriansm
Frequent Visitor

Medallion Architecture - best practices to combine data sources

Hi all,

 

I'm still fairly new to Microsoft Fabric and have been researching best practices for my use case, but haven’t found anything specific so far. I’d really appreciate hearing the community’s thoughts on this.

 

We currently have a highly unstructured and decentralized data estate spread across multiple systems. We're planning to use Fabric to implement a centralized, structured solution based on the medallion architecture, starting with our sales data. We've done most of the work with this data, but are stuck discussing what's the optimal way to join the different sources. 

 

Context:
We have two systems that process our sales data, both involve large datasets with different schemas.

  • Bronze layer: We're ingesting data from these two systems into our bronze lakehouse using two separate pipelines.
  • Silver layer: We use Dataflow Gen2 (DFG2) on each lakehouse endpoint to transform the data into a proposed standard schema. Since the source schemas differ, we have a separate DFG2 activity for each source, each publishing to its own schema within the silver lakehouse.
  • Gold layer: We plan to build one or more semantic models from the silver layer to support various Power BI dashboards.

We’re now trying to determine the most efficient way to combine or union the two datasets in the silver layer, but merely duplicating all the data into another lakehouse sounds very ineficcient. Both datasets now follow the same curated schema in the silver layer, but we’re unsure of the best approach to merge them.

 

Any advice or best practices would be greatly appreciated. 

 

Thanks in advance for your help

1 ACCEPTED SOLUTION
andrewsommer
Super User
Super User

I would start by asking if you have to merge them?  Instead of physically merging the two silver datasets into a new table in gold, define views in the gold lakehouse that logically union the two silver datasets > semantic model > power bi.

 

As a side note, I avoid Dataflow Gen 2s as much as possible; they eat up CUs.  Notebook everything you can. 

 

Please mark this post as a solution if it helps you. Appreciate Kudos.

View solution in original post

8 REPLIES 8
burakkaragoz
Community Champion
Community Champion

Hi @Adriansm 

 

Just checking in – did the steps I shared help resolve the issue?

If it’s working now, feel free to mark the response as the Accepted Solution. This helps others who face the same issue find the fix faster.
And of course, a little Kudos would be much appreciated!

If you're still running into trouble, let me know what you've tried so far and I’ll help you dig deeper. We’ll get it sorted!

burakkaragoz
Community Champion
Community Champion

Hi @Adriansm ,

 

we had a similar setup – multiple sources like SQL, SharePoint, SAP, etc. here’s what worked best for us when building a medallion architecture in Fabric:


1. use one workspace per medallion layer

  • dev/test/prod separation is easier
  • access control is cleaner
  • pipelines and deployments stay simple

2. inside each workspace, use one lakehouse per data source

  • e.g. bronze_sap, bronze_sharepoint, bronze_sql
  • this keeps raw data isolated and easier to trace

3. combine sources in silver layer

  • create unified views or tables (e.g. customer, product)
  • use notebooks or pipelines to join/clean/standardize
  • this is where cross-source logic lives

4. gold layer = business-ready

  • only include curated, validated data
  • used by semantic models and reports
  • keep it clean and minimal

bonus tip:
use schemas inside lakehouses to separate domains (e.g. sales, finance) if you want fewer lakehouses.


If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful.

Thank you, but I have a couple of questions about combining data. You're using "notebooks or pipelines", which one of the two and why? Do you join your data onto a new lakehouse in silver? and is that efficient? 

 

Thank you for your reponses

You're absolutely right — the best practice in a Medallion Architecture is to:

  • Ingest each data source separately into the Bronze layer, keeping raw data as-is.
  • Then in the Silver layer, you can clean, transform, and combine them into a unified model.

This keeps your pipeline modular and makes debugging or reprocessing much easier. Also, if one source changes or fails, it won’t break the whole flow.

In the Gold layer, you can then build business-specific views or aggregates based on that unified Silver model.

Let me know if you want a sample structure or naming convention — happy to share!

Hi, I appreciate the AI-generated reply but unfortunately I don't think this answers the question. As I stated in my original message, we already have a bronze and silver layer set up, my question was more specifically aimed at best practices for merging/combining this data.

I'll continue exploring further and will update the post with what I find.

 

Thank you

andrewsommer
Super User
Super User

I would start by asking if you have to merge them?  Instead of physically merging the two silver datasets into a new table in gold, define views in the gold lakehouse that logically union the two silver datasets > semantic model > power bi.

 

As a side note, I avoid Dataflow Gen 2s as much as possible; they eat up CUs.  Notebook everything you can. 

 

Please mark this post as a solution if it helps you. Appreciate Kudos.

That’s a very good point. We don’t necessarily need to physically merge the datasets, but there are use cases where we need to feed or compare sales data from both sources, so having them accessible in one place is important.

If I understand correctly, you're suggesting we create a SQL view in the gold layer that uses UNION ALL 

to combine the two silver tables into a single logical view. Is that right?

I suppose it all comes down to performance. Our datasets are fairly large, but from what I’ve been reading, views with UNION ALL can be quite efficient, so I’ll definitely give that a try


Thanks also for the advice regarding DFG2. A few others have flagged that as well, so we’re planning to rebuild most of it using notebooks before going live.


 

Hey, that’s exactly right , you got it!

Creating a logical view in the Gold layer using UNION ALL is a clean and scalable way to bring those Silver datasets together without physically merging them. It keeps things flexible and easier to maintain, especially when working with large volumes.

And yep, performance with views is usually solid — just keep an eye on indexing and partitioning if needed.

Glad to hear you're moving away from DFG2s too — notebooks give you way more control and efficiency.

 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.