Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during AI Skills Fest. This week only. Secure your voucher now.

rajendraongole1

Your Data is Already There — You Just Can't Find It Yet: OneLake Discovery Unpacked

Every department I engage with encounters the same underlying issue: data is plentiful, yet accessing the right information at the right moment—with sufficient confidence—is a persistent challenge. Analysts end up recreating datasets, dashboards proliferate uncontrollably, and engineers devote more effort to maintaining duplicates than extracting meaningful insights. Ironically, much of this data resides within the same cloud environment—yet it feels as inaccessible as if it were located continents apart.

 

This blog post is a practitioner's deep dive into the full discovery-to-connection lifecycle in OneLake. We will cover:

 The OneLake Namespace

One of the most practical benefits of OneLake is its consistent addressing scheme. Every piece of data in OneLake is addressable via a URI that follows this pattern

OneLake URI Format:
https://onelake.dfs.fabric.microsoft.com/{workspace}/{item}.{itemType}/{path}

Example — Lakehouse table: 
https://onelake.dfs.fabric.microsoft.com/SalesAnalytics/SalesLH.Lakehouse/Tables/FactSales

 

This means that any tool compatible with ADLS Gen2—Azure Storage Explorer, Azure Databricks, Apache Spark, or Power BI—can connect to OneLake data using the same APIs and SDKs they already use, simply by substituting the OneLake URI. 

 

What Lives in OneLake 

Fabric Item Type

What It Stores in OneLake

Lakehouse

Delta Parquet tables (managed) + unstructured files

Data Warehouse

Delta Parquet tables—queryable via T-SQL

Eventhouse / KQL Database

Event data in columnar format

Semantic Model (Direct Lake)

References Delta tables — no separate import

Dataflow Gen2 outputs

Delta tables in a staging lakehouse

Mirrored databases

Near-real-time replicated Delta Parquet tables

 

The OneLake Catalog — Your Central Discovery Hub

The OneLake Catalog is the evolved replacement for what was previously called the OneLake Data Hub. It is the single, searchable interface for all discoverable Fabric items—the storefront for your organization's data assets.

What makes the catalog more than a simple list is how it organizes data into context. Users can scope the catalog to a specific domain and subdomain—for example, Finance > EMEA—and then filter further by item type, endorsement status, workspace, or last refresh date. The result is not a flat directory but a navigable, governed data marketplace

 

How to Access the Catalog
The OneLake Catalog is accessible from multiple surfaces, which is deliberate—data discovery should happen in the context where work is being done, not as a separate detour.

Microsoft Fabric portal — the primary experience with full filtering and governance

Microsoft Teams — embedded catalog so analysts never leave their collaboration tool
Microsoft Excel — discover and connect to certified datasets from within a workbook
Power BI Desktop—connect to lakehouses and warehouses without leaving the modelling experience

 

OneLake Shortcuts — Connect Without Copying

Shortcuts are one of the most architecturally important features in OneLake—and one of the most underutilized. A shortcut is a pointer: a metadata reference that makes data stored elsewhere appear as if it lives natively inside your OneLake lakehouse. No data moves. No copy is created. The shortcut simply says, 'Look over there.'

 

Mirroring — Zero-ETL Replication into OneLake

Shortcuts answer the question, 'How do I use data that lives outside OneLake without copying it?' Mirroring answers a different question: 'how do I keep a near-real-time, governed copy of an operational database inside OneLake—without building a pipeline?'

Mirroring is a no-ETL, continuous replication feature. It monitors a source database for changes and replicates those changes into OneLake as Delta Parquet tables, typically within seconds to minutes. Once mirrored, the data is a full first-class OneLake citizen — queryable via SQL, Spark, and Power BI Direct Lake.

 

Mirroring vs. Shortcuts—When to Use Which

Consideration

Shortcuts vs Mirroring

Data movement

Shortcuts: No movement. Mirroring: Data replicated into OneLake

Best for

Shortcuts: ADLS/S3/GCS file data. Mirroring: Relational databases

Latency

Shortcuts: Real-time (reads source directly). Mirroring: Near-real-time

Transformation

Shortcuts: None at connection. Mirroring: None (raw replication)

SQL analytics

Shortcuts: Supported for Delta tables. Mirroring: Always supported

Source stays live

Shortcuts: Yes, always. Mirroring: Yes, source is unaffected

 

Direct Lake—Query OneLake Natively from Power BI

Once your data is in OneLake—whether natively, via shortcut, or via mirroring—the question becomes: how do Power BI reports consume it without the overhead of a scheduled import?

Direct Lake is the answer. It is a Power BI storage mode that reads Delta Parquet files from OneLake directly into the analysis engine at query time—without a scheduled refresh, without a separate imported copy, and without the latency of a DirectQuery live connection to a SQL endpoint

 

The Three Storage Modes Compared

Mode

How It Works

Best For

Import

Full data copy loaded into in-memory column store on refresh

Static or slowly changing data with < 1GB per table

DirectQuery

Every visual issues a query to the source at render time

Very large data with low dashboard concurrency

Direct Lake

Delta files loaded on-demand from OneLake; cached in memory

Large, fast-changing data—the Fabric-native approach

 

Direct Lake combines the performance of Import (in-memory column store) with the freshness of DirectQuery (no scheduled refresh needed). When the Delta table in OneLake is updated, the semantic model detects the change and reloads only the affected column segments—a process called "transcoding"—which typically completes in seconds.

 

Creating a Direct Lake Semantic Model

As of March 2025, Direct Lake semantic models can be authored in Power BI Desktop — not just in the Fabric portal. Key steps:

  1. In Power BI Desktop, select Get Data > Microsoft OneLake
  2. Authenticate with your Entra ID credentials
  3. Browse to your workspace and select the Lakehouse or Warehouse
  4. Select the tables you want to include—these can span multiple OneLake sources using shortcuts
  5. Power BI Desktop creates a Direct Lake connection—no import, no DirectQuery polling
  6. Build relationships and DAX measures, and publish to Fabric

Example Scenarios with OneLake in detailed Explaination:

Create Your Fabric Workspace

Everything in Fabric lives inside a workspace. In a real enterprise, a workspace maps to a team, a project, or a data domain. For this walkthrough we create a workspace to represent the team that owns the raw sales data

 

1 Navigate to the Fabric portal
Go to https://app.fabric.microsoft.com/home and sign in with your Fabric credentials.

2 Open Workspaces
In the left navigation bar, select Workspaces (the grid icon). Select + New workspace.

3 Name the workspace
Give it a meaningful name such as M365Demo_Blogs. In the Advanced section, select the Fabric or Fabric trial licence mode. Select Apply.

4 Verify the workspace
When the workspace opens, it should show an canvas ready for your Fabric item.

Create a Lakehouse and Load the Sales Data

A Lakehouse is a Fabric item that combines the flexibility of a data lake (any file type, any structure) with the governance of a data warehouse (Delta tables, schema enforcement, SQL access). It is the most natural home for raw and processed data in Fabric.

Picture1.png

 

1 Create the Lakehouse
In workspace, select + New item > Lakehouse. Name it salesLH. After a moment, the lakehouse opens with empty Tables and Files folders.

2 Download the sales dataset
Open a new browser tab and navigate to: https://raw.githubusercontent.com/rajendra1918/Datasets/refs/heads/main/sales.csv Right-click anywhere on the page and select Save as to save it as sales.csv on your local machine.

3 Upload the file
In the Lakehouse explorer, highlight the Files folder. Select the ellipsis (...) menu, then Upload > Upload files. Select your sales.csv file and confirm the upload.

4 Preview the raw file
Select the Files folder to verify sales.csv uploaded. Select the file to preview its contents. You will see the raw CSV structure.

PICTURE3.jpg

 

Load the CSV into a Delta Table
A raw CSV file in the Files folder is not yet queryable via SQL, and it does not benefit from Delta Lake features like ACID transactions, schema enforcement, or time travel. Loading it into a Delta table elevates the data into a governed, performant, queryable asset.

1 Trigger Load to Tables
In the ellipsis (...) menu for sales.csv, select Load to Files> sales.

2 Set the table name
In the Load to table dialog, set the table name to sales. Confirm the load operation and wait for the table to be created.

3 Verify the table
In the Explorer pane, select the sales table to view its data preview and schema. If the table does not appear automatically, select Refresh in the Tables folder menu.

pic6.jpg

Understand What Was Created
When you loaded the CSV, Fabric converted it into Delta Parquet format and stored it in OneLake. Here is what now exists behind the scenes

Discovering Your Data Asset in the OneLake Catalog

Now that data exists in OneLake, let us explore the discovery experience — the journey a data consumer takes to find this asset. In a real organisation, the sales Lakehouse was created . A business analyst on a different team now needs to find and use this data. The OneLake Catalog is where that journey begins.

To demonstrate cross-workspace shortcuts, we need a second workspace representing the analytics team.

1 Create Analytics workspace
Return to your workspace list and create a second workspace. Name it Analytics (or any name representing a consumer team).

2 Create a new Lakehouse
Inside the Analytics workspace, select + New item > Lakehouse. Name it analytics. This lakehouse represents the analytics team's working environment — separate from where the raw data lives.

pic13.jpg

 

pic14.jpg

 

1 Open the shortcut dialog
In the analytics Lakehouse explorer, select the ellipsis (...) menu on the Tables folder and select New shortcut.

2 Select OneLake as the source
In the New shortcut dialog, choose OneLake as the shortcut type. This means you are pointing to data inside your own Fabric tenant — not an external cloud.

3 Navigate to the source data
In the workspace list, select workspace. Then select the saleLH, expand the Tables folder, and select the sales table.

4 Review and create
On the confirmation screen, review the shortcut details and select Create.

 

PIC17.jpg

 

Query 1 — Testing with sales Table:
Let us write our first analytical query. This calculates total revenue and total quantity sold for each item, ordered by revenue descending — a common starting point for sales performance analysis.

1 Open a new query
In the toolbar, select New SQL query to open the query editor.

2 Paste and run the query
Enter the following T-SQL and select Run:

 

SELECT   Item,
    SUM(Quantity * UnitPrice)  AS TotalRevenue,
    SUM(Quantity)              AS TotalQuantity
    from [SalesAnalytics].[dbo].[sales]
GROUP BY Item
ORDER BY TotalRevenue DESC

pic20.jpg

 

Query 2:
A second useful lens is customer-level analysis. This query identifies the five highest-value customers — useful for account management and targeted marketing decisions.

Summary — The Full Picture

Let us bring the full journey together in one view. When we talk about discovering and connecting to data in OneLake, we are describing a layered architecture where each capability builds on the one below it.

 

 

Layer

Capability

What It Eliminates

Storage

OneLake — single, unified Delta lake

Siloed, multi-account storage sprawl

Discovery

OneLake Catalog — searchable data marketplace

Hunting across workspaces and Teams messages

Trust

Endorsement + Discoverability

Competing 'versions of the truth'

Connection

Shortcuts — zero-copy pointers to external data

ETL pipelines just to make data accessible

Replication

Mirroring — near-real-time DB sync into OneLake

Complex CDC pipelines from operational systems

Analytics

Direct Lake — in-memory Power BI on OneLake data

Stale import refreshes and DirectQuery slowness

Governance

Catalog Govern tab + Purview labels + lineage

Shadow IT and ungoverned data sprawl

 

The biggest shift OneLake brings is not technical — it is cultural. When data engineers publish once and business users discover without raising a ticket, and analysts connect without copying, the organisation stops functioning as a collection of data silos and starts behaving like a single intelligent data platform.

 

That is the promise of OneLake — and as of 2026, it is fully available for production use.

 
Thanks for Reading!