Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Reply
Ira_27
Helper I
Helper I

Metadata-driven ETL in Fabric: Warehouse vs Lakehouse (best practice for scaling dims/facts?)

Hello,

 

We are currently in the phase of implementing Microsoft Fabric and are evaluating whether to use Fabric Warehouse vs Lakehouse for our dimensional model.

 

We completed a POC using Warehouse, where we successfully implemented a metadata-driven ETL framework:

  • Fabric Pipeline queries a metadata/control table for job dependencies and execution order
  • Each metadata record defines which stored procedure to execute
  • Pattern used:

     
    Landing Zone (Lakehouse) → Conformed Views (Warehouse) → Stored Procedures (Warehouse)

This approach has worked well for loading both dimensions and facts in a scalable and orchestrated way.


Current Challenge (Lakehouse POC)

We are now exploring a similar approach using Lakehouse for storing dimensions and facts, but are running into some design questions:

  1. Notebook Design & Scalability
    • With ~100 dimensions and ~30 facts, is it a best practice to create one notebook per dimension/fact, or
    • Should we be building generic, parameterized notebooks driven by metadata?
  2. Handling Transformations & Joins
    • In Warehouse, we used conformed views to encapsulate joins and business logic
    • In Lakehouse, since views have limitations, what is the recommended approach?
      • Materialized Silver tables?
      • Dynamic SQL generation from metadata?
      • Some hybrid pattern?
  3. Orchestration Pattern
    • Has anyone successfully implemented a fully metadata-driven ETL framework in Lakehouse (similar to stored-proc-driven patterns in Warehouse)?
    • If yes, how are you managing:
      • Dependencies
      • Reusability
      • Debugging and lineage
  4. General Architecture Guidance
    • For enterprise-scale dimensional modeling in Fabric, is the recommended approach:
      • Pure Lakehouse
      • Pure Warehouse
      • Or a hybrid (Lakehouse for ingestion + Warehouse for serving)?

Goal

We are trying to design a robust, scalable, metadata-driven orchestration framework that can support:

  • 100+ dimensions
  • 30+ fact tables
  • Reusable ETL logic
  • Clear dependency management
1 ACCEPTED SOLUTION
v-veshwara-msft
Community Support
Community Support

Hi @Ira_27 ,

Thanks for reaching out to Microsoft Fabric Community.

 

To address your core concern first: your existing Warehouse pattern is solid and you don't need to abandon it. The recommended approach for enterprise-scale dimensional modeling in Fabric is actually a hybrid -- Lakehouse for ingestion and transformation, Warehouse for the serving layer. Both share OneLake under the hood and use Delta format, so the interoperability is seamless. You can shortcut Silver Lakehouse tables directly into the Warehouse and keep all your existing stored procedures and conformed views exactly as they are. Microsoft's own FastTrack team published a metadata-driven pipeline pattern that follows this same structure, worth going through in detail: Metadata Driven Pipelines for Microsoft Fabric | Microsoft Community Hub

 

On the notebook design question -- with 100+ dimensions and 30 facts, creating one notebook per entity will become a maintenance problem quickly. The right approach would be a small set of generic, parameterized notebooks (one for SCD1/2 dim loads, one for fact loads, one for silver materialization) that read all their configuration from your control table at runtime. The notebook is the engine; the metadata row is the config. This is covered well in the official decision guide here: Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microso...

 

For transformations and joins in Lakehouse -- since the SQL analytics endpoint is read-only, you can't replicate Warehouse views directly. The cleanest workaround is materializing Silver as Delta tables via Spark notebooks, then either shortcutting those into the Warehouse for your conformed view logic, or using Spark SQL managed views within the Lakehouse SQL endpoint for lighter joins. The shortcut approach is particularly useful because it means zero data movement and your Warehouse views can point straight at Lakehouse data.

 

For dependency management and parallel execution, look into notebookutils.notebook.runMultiple() -- it lets you define a DAG directly in a notebook, specifying which notebooks run in parallel and which wait on others. This is more reliable than trying to parallelize notebook calls inside a Pipeline ForEach, which runs into Spark session concurrency limits.

NotebookUtils notebook run and orchestration for Fabric - Microsoft Fabric | Microsoft Learn

Also worth noting: pipeline variable libraries are now GA in Fabric and were specifically designed to support metadata-driven pipeline patterns -- New Innovations for Fabric Data Factory Orchestrat... - Microsoft Fabric Community

 

For lineage and debugging, logging execution details (rows read/inserted/updated, pipeline run ID, status) to a dedicated control Lakehouse table from within each notebook gives you a clean audit trail that works well with Fabric's built-in monitoring.

 

Hope this helps. Please reach out for further assistance.

Thank you.

View solution in original post

2 REPLIES 2
v-veshwara-msft
Community Support
Community Support

Hi @Ira_27 ,

Thanks for reaching out to Microsoft Fabric Community.

 

To address your core concern first: your existing Warehouse pattern is solid and you don't need to abandon it. The recommended approach for enterprise-scale dimensional modeling in Fabric is actually a hybrid -- Lakehouse for ingestion and transformation, Warehouse for the serving layer. Both share OneLake under the hood and use Delta format, so the interoperability is seamless. You can shortcut Silver Lakehouse tables directly into the Warehouse and keep all your existing stored procedures and conformed views exactly as they are. Microsoft's own FastTrack team published a metadata-driven pipeline pattern that follows this same structure, worth going through in detail: Metadata Driven Pipelines for Microsoft Fabric | Microsoft Community Hub

 

On the notebook design question -- with 100+ dimensions and 30 facts, creating one notebook per entity will become a maintenance problem quickly. The right approach would be a small set of generic, parameterized notebooks (one for SCD1/2 dim loads, one for fact loads, one for silver materialization) that read all their configuration from your control table at runtime. The notebook is the engine; the metadata row is the config. This is covered well in the official decision guide here: Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microso...

 

For transformations and joins in Lakehouse -- since the SQL analytics endpoint is read-only, you can't replicate Warehouse views directly. The cleanest workaround is materializing Silver as Delta tables via Spark notebooks, then either shortcutting those into the Warehouse for your conformed view logic, or using Spark SQL managed views within the Lakehouse SQL endpoint for lighter joins. The shortcut approach is particularly useful because it means zero data movement and your Warehouse views can point straight at Lakehouse data.

 

For dependency management and parallel execution, look into notebookutils.notebook.runMultiple() -- it lets you define a DAG directly in a notebook, specifying which notebooks run in parallel and which wait on others. This is more reliable than trying to parallelize notebook calls inside a Pipeline ForEach, which runs into Spark session concurrency limits.

NotebookUtils notebook run and orchestration for Fabric - Microsoft Fabric | Microsoft Learn

Also worth noting: pipeline variable libraries are now GA in Fabric and were specifically designed to support metadata-driven pipeline patterns -- New Innovations for Fabric Data Factory Orchestrat... - Microsoft Fabric Community

 

For lineage and debugging, logging execution details (rows read/inserted/updated, pipeline run ID, status) to a dedicated control Lakehouse table from within each notebook gives you a clean audit trail that works well with Fabric's built-in monitoring.

 

Hope this helps. Please reach out for further assistance.

Thank you.

Hi @Ira_27 ,

Just wanted to check if the response provided was helpful. If further assistance is needed, please reach out.
Thank you.

 

Helpful resources

Announcements
Fabric Data Days is here Carousel

Fabric Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.