Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.
Materialized Lake Views (MLVs) in Microsoft Fabric are a powerful feature designed to simplify the implementation of multi-stage data pipelines, particularly in Lakehouse architectures. They provide a persisted, continuously updated view of data, enabling declarative data transformations using SQL while automating data quality checks and refresh schedules.
Key features of Materialized Lake Views include :
For this case study ,I am using the publicly available NY dataset to process the data and build a silver layer materialized lake view which has the ride details along with the total passengers and the Fare for the ride
1 Create and load the NY Taxi Data into the Fabric Lakehouse
2 Click on the Lakehouse and choose Manage Materialized lake views (preview) option
3 You need to open a notebook linked to the Fabric Lakehouse to start exploring Materialized Lake Views
4 In the code below I am creating a materialized lake view called nytaxi_clean_data by adding few check constraints to check for RideId,PickupLocationId ,DropLocationId should not be null as well as the droptime should always be greater than the pickuptime (to pull the valid rides). Before this feature availability of Materialized Views the data cleansing were done on intermediate tables within the schema which is a lot more easier now
CREATE SCHEMA IF NOT EXISTS silver_flow;
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver_flow.nytaxi_clean_data (
CONSTRAINT valid_rideid CHECK (RideId IS NOT NULL) ON MISMATCH DROP,
CONSTRAINT valid_pickuplocationid CHECK(PickupLocationId IS NOT NULL ) ON MISMATCH DROP,
CONSTRAINT valid_droplocationid CHECK(DropLocationId IS NOT NULL) ON MISMATCH DROP,
CONSTRAINT valid_droptime CHECK (DropTime > PickupTime) ON MISMATCH DROP
) AS
SELECT
RideId,
VendorId,
PickupTime,
DropTime,
PickupLocationId,
DropLocationId,
r.RateCode,
sum(PassengerCount) as PassengerCount,
sum(TotalAmount) as Fare
FROM
dbo.yellowtaxis1 y left outer join dbo.RatecodeMapping r
on y.RateCodeId = r.RateCodeId
GROUP BY
RideId,
VendorId,
PickupTime,
DropTime,
PickupLocationId,
DropLocationId,
r.Ratecode
If you observe the results of the above query ,it has the following information as shown in the screenshot below
Namespace - The Lakehouse.Schema where the materialized view is created
MLVName - The name of the Materialized View created above
MLVID - A unique Id for the Materialized View created above
RefreshPolicy - Determines how the Materialized view will be refreshed .The first refresh always is a FullRefresh
RefreshDate/TimeStamp - The Date and Time when the view was refreshed
TotalRowsProcessed - Total No of rows in the incoming data set
TotalRowsDropped - Total No of rows that did not match the validation criteria and was skipped
TotalViolations - How Many records did not match the validation criteria
ViolationsPerConstraint - The number of records which were dropped because of each check constraint mentioned in the code .here it shows {"valid_droptime" : 7825} which means the constraint for droptime has failed and because of this 7825 rows were dropped from the incoming dataset.
If I rerun the materialized view I get the below result
Here you can see the RefreshPolicy shows as "NoRefresh" which means there is no new data in the source where a refresh was needed .Also the TotalRowsProcessed ,TotalRowsDropped and TotalViolations are all 0 and the Message indicates Since the materialized already exists and there is a clause IF NOT EXISTS in the query,the view was not recreated .
Now lets take a look at the lineage .Open the LakeHouse and click on Manage Materialized view as shown in the screenshot below
It will open up the lineage view for the materialized view as shown below
Note - The lineage view takes some time to reflect and if not appearing ,close the workspace reopen again and then check the lakehouse
The option optimal refresh is turned on by Default which enables the automatic refreshing strategy for the materialized view .If this is turned on ,then either it performs a full refresh/ incremental refresh /no refresh (if the data has not changed in the source)
Also the refresh of Materialized Lake Views can be scheduled using schedule option to run at a particular time in the day
Now lets add data into one of the source tables used in the materialized views and refresh them
Insert into dbo.yellowtaxis1
values(9999996,4 ,current_timestamp,date_add(current_timestamp,4),
133,78,'KAS087',14343,2,2.5,2,2,69.05,68.5,0,0,10,0,0);
Insert into dbo.yellowtaxis1
values(9999997,1 ,current_timestamp,date_add(current_timestamp,5),
133,78,'KAS100',14500,2,2.5,2,2,129.55,128.5,0,0,10,0,0);
When the materialized view is completed with the refresh ,we will see the refresh status as completed
The Detailed logs section shows more details on the number of executors ,driver core ,driver memory ,executor memory etc
Also in the lakehouse when the materialized view is created a table by the name dbo.sys_dq_metrics will appear which holds the logs of the materialized views
SELECT * FROM dbo.sys_dq_metrics WHERE MLVName = 'nytaxi_clean_data'
The results show that a FullRefresh has happened instead of an incremental refresh as shown in the below screenshot
As mentioned in the documentation for Materliazed views ,the Refresh Materialized Lake Views in a Lakehouse - Microsoft Fabric | Microsoft Learn the For incremental refresh to take effect, it is required to set delta CDF property to delta.enableChangeDataFeed=true for the sources referenced in the materialized lake views definition.
So recreate the source table with TBLPROPERTIES as below
ALTER TABLE dbo.yellowtaxis1 SET TBLPROPERTIES(delta.enableChangeDataFeed = true);
ALTER TABLE dbo.RatecodeMapping SET TBLPROPERTIES(delta.enableChangeDataFeed = true);
Once the table properties has been altered upon inserting more records into the base table and the materialized view logs ,we can infer that the materialized view is always going for a FullRefresh .Considering the data volume to be less ,it chose to always go with a fullRefresh than the incremental refresh for my scenario I believe .But for 10 million rows my assumption was it will be easier for the materialized views to do an incremental refresh than a fullrefresh
Also there is a Data Quality Report which is a PowerBI Report which gives a neat diagrammtic view of the dropped records and the data quality which is customizable as well ,Here it shows 0.1% were bad records which were dropped from the view and 99.9% good records were loaded successfully .We can also see how those bad records are classified based on our
When to use a Materialized Lake Views:
Choose Materialized view over views when the data is very huge and the incremental refresh can be benefitted.While Views would query underlying table and compute the results on the fly ,the materialized views will store the computed results and only incrementally process the newly arriving data.
Limitations to be aware of :
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.