Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

A new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.

amasingh

Mount Microsoft OneLake on Linux VMs with BlobFuse

Overview

OneLake is a single, unified, logical data lake for your whole organization. OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data. You can access your data in OneLake through any API, SDK, or tool compatible with Azure Blob Storage or Azure Data Lake Storage (ADLS) just by using a OneLake URI instead. OneLake supports the same APIs as ADLS and Azure Blob Storage.

BlobFuse is a virtual file system driver that enables you to mount Azure Storage as a file system to Linux-based virtual machines. It uses the libfuse open-source library (fuse3) to communicate with the Linux FUSE kernel module and implements the filesystem operations using the Azure Storage REST APIs. Because OneLake supports these APIs, BlobFuse works with OneLake!

OneLake unifies data via shortcuts and applies OneLake security end‑to‑end, so you can bring external data into a single, logical namespace and control access consistently. Shortcuts currently reference internal OneLake locations as well as ADLS Gen2, Amazon S3, S3‑compatible stores, Google Cloud Storage (GCS), Microsoft Dataverse, and Azure Blob Storage—with no data movement. When you mount OneLake with BlobFuse, those shortcut folders surface just like regular folders, with the correct permissions, in the OneLake path.

Use-cases

With BlobFuse, you can mount OneLake directly onto a virtual machine as a filesystem.

  1. By writing data directly to OneLake, you can reduce the time to generate insights. The data in OneLake is readily available to all Fabric engines for various analytical use-cases.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

  1. OneLake is SaaS, and it can scale transparently without any administrative overhead. You can mount OneLake belonging to a tenant across many virtual machines, which may be hosting the same or different applications. This way you can scale out storage and use OneLake to share data among applications or use OneLake to bring data generated by different apps into one place for analytical use cases.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

Configuring OneLake as a filesystem for virtual machines

System requirements and installing blobfuse

  1. Provision an Azure VM. Ensure appropriate NSG rules are in place for this VM to connect with Fabric OneLake. If you have enabled private links for Microsoft Fabric, you must ensure that this VM can reach that private endpoint.
  2. VM OS and size – you can use any Linux-based OS. In this case, we are using the following specification.

Ubuntu 24.04.2

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27203" />

Unless you plan to run compute-heavy operations on the VM, you do not require a large VM.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27204" />

Install blobfuse

  1. Install blobfuse and its dependencies for Ubuntu. There are separate instructions available for various operating systems (Linux, SLES, Ubuntu), but this demo uses Ubuntu.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

Configure permissions on OneLake

  1. Create an application on an Entra tenant where OneLake is deployed. Note down the following details:
    1. Entra tenant ID
    2. Client_id
    3. Secret
  2. Switch to the Microsoft Fabric web user experience and navigate to the Fabric workspace which you want to use with blobfuse.
  3. In the Fabric workspace, grant the client_id Contributor role. The Contributor role is required so that you can write files to the OneLake mount. Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseAI-generated content may be incorrect.">

Prepare a configuration file for blobfuse

  1. For most scenarios, you can use the following pipeline configuration for blobfuse.
# Logger configuration
logging:
  type: base
  level: log_debug
  file-path: /home/azureuser/ols-logs.log

# Pipeline configuration components: - libfuse - attr_cache - azstorage

# Azure storage configuration

libfuse: direct-io: true negative-entry-expiration-sec: 0 allow-other: true uid: 1000 gid: 1000

streaming: enabled: true block_size_mb: 4 parallelism: 4

attr_cache: entry_timeout: 240 negative_timeout: 120 no-symlinks: true timeout-sec: 0 cleanup-on-start: true

azstorage: type: adls account-name: onelake container: <replace-with-fabric-workspace-id> subdirectory: <replace-with-fabric-lakehouse-id>/Files max-concurrency: 20 endpoint: onelake.dfs.fabric.microsoft.com mode: spn tenantid: <redacted> clientid: <redacted> clientsecret: <redacted>

  1. Save this file to the home directory of the user. You can use any other directory and filename. Here, the configuration is saved to a file named ols-ms-config.yaml.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

  1. Create a directory on the VM which will be used for mounting the OneLake /Files directory. Here, we have created a new sub-directory inside the user’s home directory.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27208" />

  1. Use the blobfuse2 command to mount the OneLake /Files directory to ~/ols-files path.

blobfuse2 mount ./ols-files/ --config-file=ols-ms-config.yaml

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

  1. Run mount | grep -i blobfuse command to verify that the filesystem has been mounted successfully.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

  1. You can also check the output logs to verify that there are no warnings or errors. In the configuration file, logging mode is set to log_debug and output is written to ols-logs.log file.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27211" />

  1. Switch directory to the newly mounted path. In this case, we have 2 sub-directories already created under the /Files path. These will appear as local directories on this VM.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27212" />

  1. Change directory to newdir and write a file. In this case, the dd command (which comes pre-installed) is used to generate data and write it to a file named dummy_file.img. Here, the dd command writes 100 x 1MB blocks to write a one 100MB file.

Note – performance of dd is dependent on resources available on the VM.

Note network bandwidth on Azure compute is generally a function of the size (SKU) of the Azure VM. While OneLake offers large network bandwidth and throughput, writes to OneLake or reads from OneLake via an Azure VM will be constrained by the network bandwidth a VM can support. Please refer to Azure compute size

Note – Latency is dependent on the region of the VM and the region where the Fabric capacity is provisioned. In this case, the VM and capacity are NOT in the same region.

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

  1. Switching over to the Fabric web user-experience and navigate to the lakehouse item's /Files directory (which you mounted using blobfuse2). This file can now be accessed by any workload on Fabric (provided appropriate permissions are in place).

Mount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuseMount_Microsoft_OneLake_on_Linux_VMs_with_BlobFuse

AI-generated content may be incorrect." class="wp-image-27214" />

Further considerations

While BlobFuse makes it easy to mount OneLake as a local filesystem, it’s important to note a few limitations. BlobFuse is not fully POSIX-compliant—certain operations, such as atomic renames or concurrent writes, may behave differently than on a traditional disk. For most analytics and data engineering scenarios, these differences are minor, but it’s worth designing your workflows with them in mind.

For performance, choose the access pattern that matches your workload. Local caching can be worth enabling if you expect random access and repeated reads of the same data; otherwise, consider streaming to avoid unnecessary disk overhead for large, sequential transfers. The right choice depends on your mix of rereads vs. one‑pass flows and the VM’s local storage characteristics.

Familiar Patterns, Minimal Changes

OneLake’s API compatibility means you can use familiar tools and patterns—like BlobFuse—without major changes to your applications or infrastructure. Whether you’re moving data into Fabric for analytics, sharing files across VMs, or integrating with existing pipelines, the process feels natural and straightforward. This example highlights how OneLake’s design lets you leverage your existing skills and workflows, making the transition to Fabric seamless for both developers and data engineers.

Learn more