Designing for Direct Lake: Architecture, Storage S...

pallavi_r

1.Direct Lake – Background and Significance

Microsoft Fabric brings storage, compute, and semantic modeling into one analytics platform, and Direct Lake is a Power BI feature that helps handle large-scale data efficiently.

1.1 What Was the Problem Before, and How Direct Lake Fixes It?

Previously, the Import mode relied on data movement and in-memory refreshes, often resulting in memory issues and scalability limitations.

DirectQuery avoided this data duplication but introduced query latency and strong dependency on the source system’s performance.

Direct Lake overcomes these drawbacks by letting the semantic model read Delta tables in OneLake skipping the need for imports or external query translation while delivering high performance with minimal data movement.

1.2 Connectivity mode comparison

1.3. How Direct Lake Stands Out Architecturally?

1.4 Why It Matters for Architects and Engineers?

This is important for BI Architects and Data Engineers to understand while building enterprise-scale analytics solutions. With Direct Lake, performance and scalability have now moved upstream to Lakehouse design, capacity planning, and storage optimization, making data-layer architectural decisions much more impactful on overall BI performance.

2. Direct Lake – Background and Significance

A successful Direct Lake setup depends less on refresh settings and more on how Delta tables are designed, how capacity is sized, and how data is prepared upstream. Since Direct Lake reads data directly from OneLake, storage design now directly impacts query performance.

2.1.1 Storage Optimization

Direct Lake performance is directly proportional to the quality of the Delta tables. Optimizing the storage layer is therefore one of the most impactful actions data team can take.

V-Order Optimization: Enabled by default, it reorganizes data at write time for better compression and faster column scans
File Consolidation: Too many small Parquet files slow queries; running OPTIMIZE to merge them into larger files improves throughput and reduces cold starts.
Partitioning: Splitting data by Year or Month lets the engine read only needed slices, but over-partitioning can hurt performance.

2.1.2 Capacity Guardrails and SKU Sizing

Fabric capacity directly affects Direct Lake query performance. Choosing the right SKU is both a cost and performance decision.
Available memory controls how much data stays cached in VertiPaq; low memory causes disk paging, increasing latency and slowing reports.

2.1.3 Upstream Data Preparation

In Direct Lake, the Delta table is the foundation for the semantic model, so transformations should happen earlier in the pipeline, not at query time.
Eliminate calculated columns in Power BI: create derived fields and data cleansing upstream using Spark, SQL, or Dataflows for better performance.
Prefer physical or materialized tables: logical SQL views can trigger DirectQuery fallback, while persisted tables maintain Direct Lake speed and stability.

3. Understanding Direct Lake Data Refresh

3.1 Framing: How Direct Lake Keeps Models in sync

Framing is how Direct Lake updates reports without reloading all the data.In Direct Lake models, snapshot isolation and incremental framing keep data consistent and up-to-date without needing full dataset refreshes.

Snapshot isolation ensures users always see a stable, point-in-time view, even as the Lakehouse continues to change.

Version Binding: Framing pins the model to a specific Delta log snapshot
Dirty Data Prevention: New or updated files remain hidden until the next framing, avoiding partial data.
ETL Continuity: Ongoing ETL jobs don’t impact queries, ensuring reports stay accurate.

Incremental framing optimizes performance by updating only changed data in memory.

Selective Memory Updates: Only changed columns are reloaded; unchanged data stays in memory.
Performance Benefit: Partial refresh avoids full dataset reload.
Data Pattern Dependency: Append-only is ideal; Overwrite forces a full reload

During framing, Fabric performs a check on Delta Log for the all the changes, identifies which Parquet files are the latest, and updates only the metadata pointers in the semantic model.

3.2 When and How Framing is Triggered?

Framing realigns the semantic model with the latest state of the Delta tables in the Lakehouse. This can occur in multiple ways:

Automatic Trigger (Default Behavior)

Fabric monitors the Delta table and updates only changed column segments, keeping unchanged data in memory.
Ideal for append-only, infrequent updates.
Minimal admin effort; keeps reports near real-time.

Manual Trigger

Initiated by Refresh in the service
Useful for on-demand, urgent data load.

Programmatic / Pipeline Trigger

Runs via Fabric Data Pipelines, notebooks, or APIs after ETL processes.
Controlled enterprise-scale workflow.

3.3 Automatic vs Manual Framing: Choosing the Right Approach

4. Fallback to DirectQuery – What Architects Must Know

Direct Lake is designed to deliver near in-memory performance without copying data into Import mode. However, under certain conditions the engine can quietly shift into a DirectQuery-style execution path. Reports still work, but performance can drop noticeably, and users usually do not receive an explicit warning.

4.1 When does fallback happen?

Fallback typically occurs when the engine cannot efficiently answer a query using cached Delta data, for reasons such as:

Complex or unsupported DAX queries
Large joins or very high-cardinality relationships
Memory or capacity limits on the Fabric SKU
Table or model designs that trigger full table scans instead of selective reads

In these scenarios, instead of serving results from fast in-memory column segments, Power BI issues live queries against the storage or SQL endpoint, which increases latency and makes performance dependent on backend compute.

4.2 How to Monitor and Identify Direct Lake Usage

As fallback is not clearly shown in the UI, it is usually recognized indirectly through signs such as:

Sudden spikes in visual load time
Inconsistent responsiveness across visuals or pages
Longer query times in Performance Analyzer / diagnostics
Unexpected rise in backend queries or capacity usage

Regular monitoring and performance testing are important, as fallback often indicates modeling, storage, or capacity design issues rather than a simple report problem.

5. Direct Lake Modes: OneLake vs SQL Endpoint

Direct Lake is available in two distinct modes: Direct Lake on OneLake and Direct Lake on SQL Endpoints. Both allow the VertiPaq engine to work directly with Delta tables but differ in architecture, deployment flexibility, and fallback behavior.

6. Direct Lake Implementation Best Practices

Use Hybrid Models: Import for small, stable dimensions; Direct Lake for large, frequently updated facts to balance speed and freshness.
Keep a Simple Star Schema: One fact with small dimensions; avoid many-to-many and bidirectional joins.
Transform Upstream: Perform calculations and cleansing in Lakehouse/Warehouse, not in the semantic model.
Maintain Unique Keys: Ensure uniqueness on one side of relationship.
Avoid Small Files: Run OPTIMIZE and VACUUM to reduce file fragmentation and latency.
Partition Smartly: Use low-cardinality fields like Year or Month; avoid over-partitioning.
Use V-Order for Large Tables: Significantly faster reads.
Reframe After ETL: Trigger framing only after full data loads for consistency.
Warm the Cache: Run a small post-refresh query to preload key columns.
Apply RLS in Semantic Model: SQL-level RLS can trigger DirectQuery fallback.
Materialize Views: Prefer physical Delta tables over logical SQL views for stability and performance.

7. Conclusion

Direct Lake shifts performance focus from reports to storage. Earlier, optimization was mostly DAX and visuals. Now, Lakehouse design, partitions, and file structure directly control report speed. It delivers high speed and scale only with well-organized data and right-sized capacity.

BI performance is no longer just modeling — storage, capacity, and semantic design must work as one system.

Designing for Direct Lake: Architecture, Storage Strategy, and Performance in Microsoft Fabric

Understanding UDF in Microsoft Fabric: Two Concept...

Designing for Direct Lake: Architecture, Storage S...

Materialized Lake Views -Case Study -Using NYC Tax...

Azure Key Vaults for secret or credential protecti...

Why Fabric Notebooks Are Useful in a Lakehouse

FabCon is coming to Atlanta

Designing for Direct Lake: Architecture, Storage Strategy, and Performance in Microsoft Fabric

Understanding UDF in Microsoft Fabric: Two Concept...

Designing for Direct Lake: Architecture, Storage S...

Materialized Lake Views -Case Study -Using NYC Tax...

Azure Key Vaults for secret or credential protecti...

Why Fabric Notebooks Are Useful in a Lakehouse