Materialized Lake Views -Case Study -Using NYC Tax...

AparnaRamakris · ‎12-17-2025

Materialized Lake Views (MLVs) in Microsoft Fabric are a powerful feature designed to simplify the implementation of multi-stage data pipelines, particularly in Lakehouse architectures. They provide a persisted, continuously updated view of data, enabling declarative data transformations using SQL while automating data quality checks and refresh schedules.

Key features of Materialized Lake Views include :

Declarative SQL Pipelines using the SQL command create materialized lake view
Automated Lineage and Refresh
Built in Data Quality Checks
Monitoring

For this case study ,I am using the publicly available NY dataset to process the data and build a silver layer materialized lake view which has the ride details along with the total passengers and the Fare for the ride

1 Create and load the NY Taxi Data into the Fabric Lakehouse

2 Click on the Lakehouse and choose Manage Materialized lake views (preview) option

3 You need to open a notebook linked to the Fabric Lakehouse to start exploring Materialized Lake Views

4 In the code below I am creating a materialized lake view called nytaxi_clean_data by adding few check constraints to check for RideId,PickupLocationId ,DropLocationId should not be null as well as the droptime should always be greater than the pickuptime (to pull the valid rides). Before this feature availability of Materialized Views the data cleansing were done on intermediate tables within the schema which is a lot more easier now

CREATE SCHEMA IF NOT EXISTS  silver_flow;

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver_flow.nytaxi_clean_data (
    CONSTRAINT valid_rideid CHECK (RideId IS NOT NULL) ON MISMATCH DROP,
    CONSTRAINT valid_pickuplocationid CHECK(PickupLocationId IS NOT NULL ) ON MISMATCH DROP,
    CONSTRAINT valid_droplocationid CHECK(DropLocationId IS NOT NULL) ON MISMATCH DROP,
    CONSTRAINT valid_droptime CHECK (DropTime > PickupTime) ON MISMATCH DROP

) AS
SELECT 
    RideId,
    VendorId,
    PickupTime,
    DropTime,
    PickupLocationId,
    DropLocationId,
    r.RateCode,
    sum(PassengerCount) as PassengerCount,
    sum(TotalAmount) as Fare
FROM 
   dbo.yellowtaxis1 y left outer join dbo.RatecodeMapping r
   on y.RateCodeId = r.RateCodeId
GROUP BY
   RideId,
   VendorId,
   PickupTime,
   DropTime,
   PickupLocationId,
   DropLocationId,
   r.Ratecode

If you observe the results of the above query ,it has the following information as shown in the screenshot below

Namespace - The Lakehouse.Schema where the materialized view is created

MLVName - The name of the Materialized View created above

MLVID - A unique Id for the Materialized View created above

RefreshPolicy - Determines how the Materialized view will be refreshed .The first refresh always is a FullRefresh

RefreshDate/TimeStamp - The Date and Time when the view was refreshed

TotalRowsProcessed - Total No of rows in the incoming data set

TotalRowsDropped - Total No of rows that did not match the validation criteria and was skipped

TotalViolations - How Many records did not match the validation criteria

ViolationsPerConstraint - The number of records which were dropped because of each check constraint mentioned in the code .here it shows {"valid_droptime" : 7825} which means the constraint for droptime has failed and because of this 7825 rows were dropped from the incoming dataset.

If I rerun the materialized view I get the below result

Here you can see the RefreshPolicy shows as "NoRefresh" which means there is no new data in the source where a refresh was needed .Also the TotalRowsProcessed ,TotalRowsDropped and TotalViolations are all 0 and the Message indicates Since the materialized already exists and there is a clause IF NOT EXISTS in the query,the view was not recreated .

Now lets take a look at the lineage .Open the LakeHouse and click on Manage Materialized view as shown in the screenshot below

It will open up the lineage view for the materialized view as shown below

Note - The lineage view takes some time to reflect and if not appearing ,close the workspace reopen again and then check the lakehouse

The option optimal refresh is turned on by Default which enables the automatic refreshing strategy for the materialized view .If this is turned on ,then either it performs a full refresh/ incremental refresh /no refresh (if the data has not changed in the source)

Also the refresh of Materialized Lake Views can be scheduled using schedule option to run at a particular time in the day

Now lets add data into one of the source tables used in the materialized views and refresh them

Insert into dbo.yellowtaxis1
values(9999996,4 ,current_timestamp,date_add(current_timestamp,4),
    133,78,'KAS087',14343,2,2.5,2,2,69.05,68.5,0,0,10,0,0);

Insert into dbo.yellowtaxis1
values(9999997,1 ,current_timestamp,date_add(current_timestamp,5),
    133,78,'KAS100',14500,2,2.5,2,2,129.55,128.5,0,0,10,0,0);

When the materialized view is completed with the refresh ,we will see the refresh status as completed

The Detailed logs section shows more details on the number of executors ,driver core ,driver memory ,executor memory etc

Also in the lakehouse when the materialized view is created a table by the name dbo.sys_dq_metrics will appear which holds the logs of the materialized views

 SELECT * FROM dbo.sys_dq_metrics WHERE MLVName = 'nytaxi_clean_data'

The results show that a FullRefresh has happened instead of an incremental refresh as shown in the below screenshot

refresh history.png

As mentioned in the documentation for Materliazed views ,the Refresh Materialized Lake Views in a Lakehouse - Microsoft Fabric | Microsoft Learn the For incremental refresh to take effect, it is required to set delta CDF property to delta.enableChangeDataFeed=true for the sources referenced in the materialized lake views definition.

So recreate the source table with TBLPROPERTIES as below

 ALTER TABLE dbo.yellowtaxis1 SET TBLPROPERTIES(delta.enableChangeDataFeed = true);
 ALTER TABLE dbo.RatecodeMapping SET TBLPROPERTIES(delta.enableChangeDataFeed = true);

Once the table properties has been altered upon inserting more records into the base table and the materialized view logs ,we can infer that the materialized view is always going for a FullRefresh .Considering the data volume to be less ,it chose to always go with a fullRefresh than the incremental refresh for my scenario I believe .But for 10 million rows my assumption was it will be easier for the materialized views to do an incremental refresh than a fullrefresh

Also there is a Data Quality Report which is a PowerBI Report which gives a neat diagrammtic view of the dropped records and the data quality which is customizable as well ,Here it shows 0.1% were bad records which were dropped from the view and 99.9% good records were loaded successfully .We can also see how those bad records are classified based on our

When to use a Materialized Lake Views:

Materialized lake views are ideal for building simple medallion architecture using Bronze -> Silver-> Gold layers where the Bronze can be raw tables /files and Silver and Gold layer can be materialized views and the whole data load can work without a Fabric Pipeline /heavy SQL code to check the watermark and incrementally load the data
Materialized lake views are also a good fit to do some data quality checks and omit or drop those records and the Power BI Data Quality Report serves as a good source of information which can be sent to the business for analysis.
Choose Materialized view over views when the data is very huge and the incremental refresh can be benefitted.While Views would query underlying table and compute the results on the fly ,the materialized views will store the computed results and only incrementally process the newly arriving data.

Limitations to be aware of :

There are limitations with respect to the incremental refresh as seen in this case that the source table needs to be CDC enabled as well as use of inbuilt functions and the append only queries are supported when it is an incremental refresh.
The feature is in preview and hence not recommended for production workloads.
Please refer the documentation for the list of limitations with respect to incremental refresh Refresh Materialized Lake Views in a Lakehouse - Microsoft Fabric | Microsoft Learn

Materialized Lake Views -Case Study -Using NYC Taxi Data

Designing for Direct Lake: Architecture, Storage S...

Zero-Copy Access to One ake Data in Azure Databric...

Why Fabric Notebooks Are Useful in a Lakehouse

A Cost-Effective Fabric Solution Driven by Azure A...

Config-Driven Kusto to Fabric Lakehouse Ingestion:...

FabCon is coming to Atlanta