Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

A new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.

bsankaran

Materialized Lake Views in Microsoft Fabric (Generally Available)

If you haven’t already, check out Arun Ulag’s hero blog “FabCon and SQLCon 2026: Unifying databases and Fabric on a single, complete platform” for a complete look at all of our FabCon and SQLCon announcements across both Fabric and our database offerings. 


Since introducing MLVs (Preview) at Build 2025, data engineers have used them to replace hand-built ETL pipelines with a few declarative Spark SQL statements, and their feedback directly shaped this release.

This update closes the most important gaps since reaching preview and makes MLVs production-ready at scale. With multi-schedule support, broader incremental refresh, PySpark authoring, in-place updates, and stronger data quality controls, teams can now build, run, and evolve medallion pipelines with far less operational overhead.

What are Materialized Lake Views?

A materialized lake view in Fabric is a persisted, automatically refreshed view defined in Spark SQL or PySpark. It enables express multi-stage Lakehouse transformations, typically referred to as medallion architecture in the bronze-to-silver-to-gold pattern as declarative statements rather than custom Spark jobs. Fabric tracks dependencies between MLVs, orchestrates refreshes in the correct order, and enforces data quality constraints at every stage. The result is a complete medallion pipeline you can set up in minutes and monitor from a single pane.

Materialized_Lake_Views_in_Microsoft_Fabric_Generally_AvailableMaterialized_Lake_Views_in_Microsoft_Fabric_Generally_Available

AI-generated content may be incorrect." />

Figure: Materialized lake views make it easier to implement medallion architecture on Fabric and make your pipelines production ready.

Broader clause coverage for optimal refresh

Processing only what’s changed instead of recomputing an entire view has been a core promise of MLVs since preview. Optimal refresh covers far more of the queries data engineers write every day.

MLVs can now refresh incrementally when the definition includes:

  • Aggregations such as COUNT and SUM with GROUP BY.
  • Left outer joins, left semi joins.
  • Common table expressions.
These additions mean that most real-world medallion pipelines qualify for incremental processing without any rewriting.

You don’t need to decide when incremental refresh applies. With optimal refresh, a built-in decision engine examines each refresh, evaluates the volume of changed data against the cost of a full recomputation, and automatically chooses the faster path. Change Data Feed is enabled by default on every new MLV, so there is nothing to configure.

The result is straightforward: as your data grows, refresh times stay predictable and compute costs stay low.

PySpark authoring support for MLVs (Preview)

With PySpark support, data engineers can now create, refresh, and replace MLVs directly from Fabric notebooks using PySpark and the familiar DataFrameWriter API.

Data engineers reach for PySpark when their work goes beyond what Spark SQL can express cleanly — applying custom cleansing logic with Python libraries, calling user-defined functions that wrap business rules or ML models, or scheduling multi-step transformations that mix procedural code with DataFrame operations. These patterns are common in production Lakehouse pipelines, and PySpark authoring now brings them into the MLV framework.

Materialized_lake_views_New_menu_in_Fabric_Lakehouse_showing_Create_with_PySparkMaterialized_lake_views_New_menu_in_Fabric_Lakehouse_showing_Create_with_PySpark

Figure: New PySpark (Python) notebook option for creating materialized lake views.

PySpark MLVs support:

  • Data quality constraints (including expression-based rules and session-scoped UDFs).
  • Table properties.
  • Scheduled refreshes from the same notebook where data is prepared and explored.
The entire pipeline, from raw ingestion through a production-quality gold layer, can now live in one place. Today, PySpark MLVs perform a full refresh on each run. Optimal refresh support for PySpark-authored MLVs is coming soon.

For a step-by-step walkthrough, see the PySpark MLV documentation.

Multi-Schedule Support

Previously, users were able to refresh all MLVs in a lakehouse on a single schedule. Teams with multiple data products at different cadences often worked around this with notebooks. This approach is error-prone and bypasses dependency management, centralized error reporting, and retry logic. For example, notebook-triggered refreshes do not surface MLV error details; failures appear only in the cell output, and dependent views have no awareness of them. Errors can persist week after week without anyone knowing the pipeline is broken.

Now, multi-schedule support removes that complexity. You can now define named schedules within a lakehouse, each targeting a specific subset of views. For example:

  • A finance pipeline can refresh the gold layer hourly.
  • A lower-priority analytics pipeline can run every six hours.
  • No custom scripting is required.
When a named schedule runs, Fabric refreshes all upstream dependencies in the correct order, executes independent views in parallel, and surfaces errors centrally so issues don’t go undetected. If a run is already in progress when a schedule fires, the new run is skipped, and the next window proceeds as expected.

Materialized_lake_views_management_page_showing_new_Schedules_panel_on_the_rightMaterialized_lake_views_management_page_showing_new_Schedules_panel_on_the_right

Figure: Multiple independent schedules can now be configured for materialized lake view runs within a single lakehouse.

Tip: Notebook-triggered refreshes don’t provide full dependency awareness or centralized visibility. Use Managed MLVs in Lakehouse for automatic dependency management, retries, and automatic monitoring.

In-place view updates with Replace

Business logic changes. A filter condition shifts, a join gains a new column, an aggregation adds a metric. Previously, updating an MLV meant dropping and recreating it from scratch, losing refresh history and forcing downstream consumers to reconnect.

With Replace, you can update an MLV's definition in place. Fabric validates the new logic, swaps it in, and preserves the view's identity, metadata, and lineage. Downstream dependencies remain intact. Replace works for both SQL and PySpark-authored MLVs.

Stronger data quality rules

This update significantly expands data quality enforcement. In preview, constraints could check whether a column was null or matched a fixed value. For PySpark-authored MLVs, constraints can now:
  • Use expression-based logic combining multiple columns.
  • Apply arithmetic and built-in functions in a single rule.
  • Invoke session-scoped user-defined functions for validation logic that lives in Python rather than SQL.
Fabric tracks every constraint across every refresh and surfaces the results in a data quality report. You can quickly spot which rules fail most often, which views they affect, and how trends shift over time without building a separate monitoring pipeline.

What’s ahead

This is a milestone, not the finish line. We are actively working on optimal refresh for PySpark-authored MLVs, expanded optimal refresh coverage for more SQL operators, and deeper integration with other Fabric workloads. The roadmap is shaped by your feedback. Share your ideas on the Fabric Ideas portal and help influence what comes next.

Get started

Materialized Lake Views are available today in every Microsoft Fabric workspace. To start building: