Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

pallavi_r

Reinventing the Gold Layer Using Materialized Lake Views

1. Foundations of the Medallion Architecture

As we know, modern analytics platforms must balance scale, reliability, and governance to process data from source to target. The Medallion Architecture addresses this by organizing data into bronze, silver, and gold layers, each with a well-defined responsibility. This layered approach separates raw ingestion from transformation and business-ready data, making analytics solutions methodical and easier to manage, govern, and scale across the enterprise.

 

2. Why Traditional Gold Tables Break Under Scale

pallavi_r_1-1767095882153.png

 

As business reporting requirements evolve over time, the reporting layer expands to support multiple teams. This makes it increasingly difficult to manage and maintain the gold layer, especially as dependencies on gold tables and warehouses grow.

Refresh schedules, orchestration, and interdependencies become tightly coupled, meaning changes in source systems or upstream data can directly impact the gold layer. This complexity makes lineage harder to track and often leads to confusion around data ownership and trust.

And, as the number of data sources increases, the operational effort rises significantly affecting deployment speed, testing, monitoring, and the long-term maintainability of the analytics platform.

Over time, the gold layer became difficult to manage and govern. Teams spent more time handling complexity than delivering insights. This highlights the need for the gold layer to evolve into a more flexible and reusable consumption layer—one that can scale with growing business needs while keeping operational effort under control.

 

3. Materialized Lake Views in Microsoft Fabric

Microsoft Fabric introduces Materialized Lake Views (MLVs) as a built-in capability to simplify how curated data is prepared and consumed. Materialized Lake Views store precomputed results of SQL queries on top of Lakehouse data, combining the flexibility of views with the performance benefits of materialized data. They can be refreshed automatically and reused across multiple analytical workloads, making them well suited for enterprise-scale analytics.

 

4. How Materialized Lake Views Address This

Materialized Lake Views reduce complexity by centralizing business logic into reusable, precomputed views instead of multiple physical gold tables. Rather than creating separate tables for each reporting requirement, teams can define standardized aggregations once and reuse them across dashboards and analytical workloads.

When business rules change, updates can be made in a single place and automatically reflected wherever the view is used. This significantly reduces maintenance effort, minimizes deployment cycles, and lowers the risk of inconsistent metrics across reports.

 

5. Where Materialized Lake Views Fit in the Medallion Architecture

In the Medallion Architecture, Materialized Lake Views (MLVs) fits best between the silver and gold layers. Silver tables provide clean, conformed, and trusted data, while MLVs use SQL to precompute and shape this data for consumption—without creating extra physical gold tables.

Instead of building a separate gold table for every business requirement, MLVs act as reusable, precomputed consumption layers on top of silver data. They deliver the performance of gold tables while staying closely aligned with Silver, reducing duplication and simplifying governance.

 

6. Materialized Lake Views in a Metadata-Driven Design (MDD) Approach

In a metadata-driven design, ingestion and transformation processes are controlled through configuration rather than hard-coded logic. Metadata tables typically define source systems, load patterns, schedules, and target locations for bronze and silver layers. Materialized Lake Views do not replace this ingestion metadata; instead, this extends the metadata-driven approach into the consumption layer.

By standardizing business logic and aggregations at the view level, Materialized Lake Views allow teams to manage analytical outputs in a consistent and reusable manner. While ingestion metadata determines what data is loaded and where it lands, Materialized Lake Views define how curated data should be consumed.

In large, multi-source environments, this separation of concerns simplifies maintenance. Changes to source systems continue to be managed through ingestion metadata, while changes to business logic or aggregations are handled centrally through Materialized Lake Views. This makes the overall platform easier to scale, govern, and evolve as analytical requirements grow.

 

Design Approach for MDD-MLV driven medallion architecture 

Start from Bronze ingestion (with metadata tables), refine to Silver (with mdd), then aggregate to Gold MLVs for optimized Delta persistence. Schedule full/incremental modes via Fabric's UI, integrating data quality rules for governance.

pallavi_r_0-1767095823773.png

 

MDD–MLV Implementation

pallavi_r_0-1767095654857.png

A sample materialized view

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS ecommerce_lh.gold.mv_monthly_salesbycategory

(

CONSTRAINT chk_amt_positive

    CHECK (total_amount > 0) ON MISMATCH DROP

)

AS

SELECT

    category,

    MONTH(order_date) AS month,

    SUM(total_amount) AS total_amount

FROM ecommerce_lh.silver.cleaned_ecommerce

GROUP BY

    category,

    MONTH(order_date)

ORDER BY

    category,

    month;

 

7. Gold Aggregation Design Patterns with Materialized Lake Views

Materialized Lake Views (MLVs) support multiple proven design patterns for creating Gold-level aggregations without the need to build and maintain numerous physical tables. Instead of duplicating data and logic, commonly used aggregations are defined once and reused across reports and analytical workloads.

  • Time-Based Aggregations: Daily, monthly, yearly summaries precomputed once and reused across reports.
  • KPI & Metrics: Centralized enterprise KPIs (revenue, orders, growth) for consistent, fast reporting.
  • Domain-Specific Views: Sales, Finance, Supply Chain, Marketing views summarized by key dimensions for cross-team analysis.

8. Materialized Views vs Gold data warehouse

 

Here’s a quick comparison between Materialized Views and Gold Tables:

Materialized Views

Gold Tables

Best suited for fast-changing business metrics

Best suited when schema stability is required

Help reduce Gold table sprawl

Support lifecycle management

Simplify pipelines and reduce orchestration complexity

Require dedicated pipelines and orchestration

Well suited for Power BI and Fabric-native analytics

support external BI tools or applications

Flexible and agile

Enforce strict dimensional modelling

 

9. Benefits

Materialized Lake Views optimize analytics by simplifying architecture, reducing operational overhead, and accelerating insights while ensuring governed, up-to-date data.

  • Reduced Gold tables: Minimizes the need for multiple physical Gold tables by centralizing transformations in MLVs.
  • Lower maintenance effort: Minimizes pipelines and absorbs business or source changes in a single place.
  • Automatic lineage: Provides built-in, end-to-end lineage without additional governance overhead.
  • Faster analytics: Delivers quicker dashboard and report performance using precomputed results.
  • Fresh data: Keeps analytics up to date through managed, incremental refreshes.
  • Better performance & scalability: Serves optimized data directly from the Lakehouse for responsive, scalable Power BI workloads.
  • Provides inbuilt monitoring, quality reports, visual monitoring of runs

10. Conclusion

Materialized Lake Views (MLVs) extend the Medallion architecture by simplifying the design and consumption of the gold layer. When combined with a metadata-driven approach, MLVs enable faster development, easier change management, and high-performance analytics at scale. This makes it a practical, scalable choice for organizations using Microsoft Fabric to meet evolving business needs while maintaining strong governance and performance.

By calculating transformations once during refresh, MLVs eliminate repeated, expensive queries across BI reports or notebooks—shifting compute from on-demand CU usage to predictable, scheduled processing.

Comments

Well explained ,for sure materialized lake views are evolving ,but currently the incremental refresh feature has limitations where the view always go for complete refresh ,hoping to see incremental refresh with support for all complex scenarios which will be a game changer for huge data sources  

MLVs support data quality checks (constraints) but these only support two actions FAIL or DROP.
In the case of DROP there appears to be no way to determine what rows were dropped.
This then makes it puzzling how any troubleshooting, corrective action and reprocessing is meant to be done for those rows. 
While you could "hand code" a solution most of these end up effectively meaning that you do not use a DQ check for this logic i.e. the DQ check + DROP seems to be useless unless you do not care what data was dropped. But this tends to suggest that you then only use MLVs at level where most significant data cleaning has already been done. 
Any thoughts on this ? Have I missed something regarding determining which rows were DROPped ?

Hi @amthog 

Good observation, you are absolutely right. For now MLVs aren’t ready for troubleshooting or corrective workflows, they’re good to be applied on already-trusted data. For observability and reprocessing, that logic should live upstream typically in Silver using validation logic, audit tables.

For now, MLVs work best where data cleansing is already done, with FAIL used for contract enforcement and DROP here only as a guardrail when row-level tracking doesn’t matter.

 

Hi @amthog and @pallavi_r ,

 

I have a slightly different perspective about the data quality check .All data that do not meet the constraints and are ignored are logged in the backend table .You may run queries on the log table to find out how many records were skipped ,which constraint validation led to skipping of these records as well from sys_dq_metrics

 

SELECT TOP (100) [Namespace],
			[MLVName],
			[MLVID],
			[RefreshPolicy],
			[RefreshDate],
			[RefreshTimestamp],
			[Message],
			[TotalRowsProcessed],
			[TotalRowsDropped],
			[TotalViolations],
			[ViolationsPerConstraint]
FROM [lakehouse].[dbo].[sys_dq_metrics]

According to me ,this is more than enough to backtrack on the bad record and make sufficient corrections on them from business if needed.Please read the blog here Materialized Lake Views -Case Study -Using NYC Tax... - Microsoft Fabric Community to understand more as I have a detailed case study using NYC Taxi data

hi @pallavi_r , @AparnaRamakris 
Thanks for your responses. Much appreciated.

I think the fact that each of you had a somewhat different opinion on how/where MLVs could be used serves to highlight the concern/confusion here. 

@AparnaRamakris : I did read your link, and I will comment on the DQ report, but wanted to lay out some thoughts first.

If we consider the DQ checks to be business rules, then any issues with the data need to be actionable. There are two aspects here that concern me; DRY (do not repeat yourself) and alerting. 

Ideally a business rule should not need to be repeated in multiple places (DRY). Doing so creates a double maintenance burden and the risk of issues/bugs if various copies do not agree. 
With the both DROP and FAIL in order for them to be actionable some (or all) of the incorrect rows need to be identified.  Obviously we can issue a query to find them but...this query has to repeat the business rules from the DQ checks and get it right. For instance if there are three checks A, B, C then the query needs to be NOT (A or B or C). This repetion is a risk/burden whether being done manually by a troubleshooter or embedded in some further automation. 

There is also the (smaller) matter that (presumably) both the DROP and FAIL DQ checks have identified specific incorrect rows (all for DROP, or at least one for FAIL ?) and used some compute to do so but we then have to expend further compute to explicitly find them "again" . In general storage is orders of magnitude cheaper than compute so having the first result (failed rows) stored , would be preferable to expending compute to get them again. Or at least have the option for them to be stored.

The next thing is alerts.

For FAIL presumably it is fairly easy to set up an alert as an explicit error occurs. With DROP it is less clear how to achieve this (likely activator triggers off the DQ report underlying semantic model). In both cases the repetition of rules/compute above would be seem still be needed for the troubleshooting and maybe even to generate the alert. 

> According to me ,this is more than enough to backtrack on the bad record
> and make sufficient corrections on them from business if needed.


I have not had much time yet to dig into what is in the DQ report, but unless it does actually contain the details or ids of all the failed rows (i.e. they _are_ stored) the duplication mentioned above will still be required. 
I have not taken a close look at the underlying semantic model and assessed whether there is anything that can be used to automate alerting / troubleshooting / reprocessing. I suspect it will only contain aggregates at per refresh granularity. This would be suffient for alerting if it can be hooked up to Activator but would mean actioning would still require the duplication mentioned above. Conversely if there was the option to quarantine failed rows to storage this would not only address the duplication issue but could also likely be used to trigger automation.

Now by saying this I am not saying that MLVs are useless or lack value. It is more that it makes it unclear or confusing when/where it is reasonable to use them given realistic operational requirements. So for example I see some articles indicating use in scenarios where input data is already relative clean (silver, gold) but others that indicate value earlier but with no real indication of how to address the operational matters I have mentioned. 


To some extent the DQ checks almost feel like a feature that can't really be used i.e. if you sidestep the issue by saying that it is only suitable for use on relatively clean/curated data then it this seems to be equivalent to saying only use DQ checks in situations where they probably won't be needed, or if you do say use them on dirtier data then the fact that you will probably have to create a duplicate WHERE clause or similar to "find" the failed rows begs the question of why not just use that WHERE clause (or CASE clause) once to separate good/bad rows and skip using the DQ checks altogether i.e. MLVs are useful, but potentially the DQ checks are self-defeating and should not be used ?. 

It was very much the case that when I first read the MLV documentation my _first _question was how to get the rows failed by the DQ checks, in order for me to understand how I could use MLVs in production.  Everything else about the feature made sense to me, but not the DQ checks.

Hope some of this makes sense, and certainly happy for any clarifications or insights on how I might have the wrong perspective here.
 
cheers