Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Reply
dead_DE
Frequent Visitor

Can local PySpark access OneLake using ABFSS paths?

Hi everyone,

I’m new to Fabric and Azure, and I’m trying to set up a local development workflow.

I’m running PySpark inside a container on my machine. I can access Microsoft Fabric using only my Microsoft Entra ID (no Azure subscription tied to me personally). Using azure.identity, I’m able to generate tokens and successfully access files in OneLake through the Python SDK.

What I haven’t been able to do is configure my local Spark environment to use these tokens to read data directly from OneLake using abfss:// paths. Every attempt to configure the ABFS drivers fails, and I’ve seen some comments suggesting that this scenario isn’t currently supported.

Right now it looks like the only viable approach for local Spark development is to download the files from OneLake (via the SDK) and work with them locally, essentially mirroring the Lakehouse on my machine.

Given that my access is limited, I’m wondering:

  • Is there any supported way to authenticate Spark directly against OneLake from outside Fabric?

  • If token‑based access isn’t possible, is there another authentication method or permission I could request that would allow this?

Any guidance or clarification would be greatly appreciated.

3 ACCEPTED SOLUTIONS
deborshi_nag
Power Participant
Power Participant

Hello @dead_DE

 

You can use local spark to access your OneLake storage, however you'd have to use a service principal. These are the prerequisites and the steps involved -

 

1. Make sure your Fabric tenant allows external apps

Your Fabric admin must enable:

  • Users can access data stored in OneLake with apps external to Fabric
  • (For SPs) Service principals can call Fabric public API
    Then grant your service principal Contributor (or above) to the Fabric workspace.

 

2. Collect the OneLake ABFSS path

OneLake uses ADLS Gen2‑compatible URIs. The account name is always onelake, and the filesystem is your workspace name (or GUID). Typical pattern:

 

abfss://<workspaceName or workspaceGUID>@onelake.dfs.fabric.microsoft.com/<lakehouseName or itemGUID>.lakehouse/Files/
 

3. Ensure your local Spark has the ABFS connector

You need Hadoop’s hadoop-azure (ABFS) + azure-storage bits on the classpath.

 

4. Create a Microsoft Entra service principal

Record Tenant (Directory) ID, Client ID (Application ID), and Client Secret, and grant the SP access to your Fabric workspace

 

5. Configure Spark for ABFS OAuth against OneLake host

 
pyspark \
  --conf "fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com=OAuth" \
  --conf "fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" \
  --conf "fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com=<APP_CLIENT_ID>" \
  --conf "fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com=<APP_CLIENT_SECRET>" \
  --conf "fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com=https://login.microsoftonline.com/<TENANT_ID>/oauth2/token"

 

Kindly accept this solution if it solves your problem. 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

Hello @dead_DE 

 

Here's how you can create a service principle or SPN. If there's a team who creates SPN for you in your organisation, they'd know this.

  • Go to Entra ID portal
    > Applications > App registrations > New registration
  • Note the:
    • Client ID (App ID)
    • Tenant ID
    • Client Secret 

 

A tenant setting must be changed for SPN to access a Fabric workspace. 

Tenant Settings > Developer Settings > “Service principals can use Fabric APIs”

 

Next you need to assign the SPN Contributor access to your workspace

In Fabric:

  1. Open the Workspace
  2. Select Manage access
  3. Click Add people or groups
  4. Search for your App Registration name (NOT the GUID!)
    • SPNs show up by the registered application name
  5. Assign an appropriate role:
    • Contributor (usually recommended)

Once both the steps are done, you can follow the message I posted on this thread earlier. By assigning the SPN Contributor access to the workspace allows it to write to OneLake. 

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

Following the guide above on creating a SPN i was able to get set up with a App cleint id and and app client secret.  

My container running spark needed some configuring as well.  I had to set it up with Delta Lake Jars and Azure Blob Storage configurations.

Full Example of how i got this to work

from pyspark.sql import SparkSession
# Create Spark session with Java 17 compatibility
spark = (SparkSession.builder
    .appName("OneLakeAccess")
    .master("local[*]")
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.4,io.delta:delta-spark_2.12:3.1.0")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .config("spark.hadoop.fs.abfss.impl", "org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem")
    .config("spark.hadoop.fs.abfs.impl", "org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem")
    .getOrCreate())

conf = spark.sparkContext._jsc.hadoopConfiguration()

conf.set("fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com", "OAuth")
conf.set("fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
conf.set("fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com", APP_CLIENT_ID)
conf.set("fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com", APP_CLIENT_SECRET)
conf.set("fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com", f"https://login.microsoftonline.com/{TENANT_ID}/oauth2/token")

df = spark.read.csv(<abfss_path_to_fabric_resource_in_lakehouse>)
df = spark.read.format("delta").load(<abfss_path_to_fabric_delta_table_in_lakehouse>)

 I am now able to connect to my fabric Lakehouse Files and Delta Tables with ABFSS paths from my spark container.

View solution in original post

13 REPLIES 13
AparnaRamakris
Microsoft Employee
Microsoft Employee

I have personally used cli to login to Fabric Tenant and used Azure.Identity to authenticate and this has worked for me to connect and test things in local ,though it may not be suitable for production deployments .

 

az login --allow-no-subscriptions --tenant <TenantId>
from azure.identity import DefaultAzureCredential
delta_token = DefaultAzureCredential().get_token("https://storage.azure.com/.default").token
storage_options = {"bearer_token": delta_token, "use_fabric_endpoint": "true"}
DELTA_TABLE_PATH: str = '<complete abfss path of the lakehouse in Fabric>'
df = DeltaTable(DELTA_TABLE_PATH, storage_options=storage_options)
print(df.to_pandas().head(10))

If this code helps ,Accept as a Solution to help others as well 

Hi @AparnaRamakris I agree, connecting to Fabric OneLake from local Python (Pandas) is not an issue - it works! I think the user was trying to connect local Spark to Fabric, which is slightly different. 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.
v-hashadapu
Community Support
Community Support

Hi @dead_DE , Hope you're doing fine. Can you confirm if the problem is solved or still persists? Sharing your details will help others in the community.

v-hashadapu
Community Support
Community Support

Hi @deborshi_nag , Thank you for reaching out to the Microsoft Community Forum.

 

We find the answer shared by @deborshi_nag  is correct. Can you please confirm if the solution worked for you. It will help others with similar issues find the answer easily.

 

Thank you @deborshi_nag  for your valuable response.

deborshi_nag
Power Participant
Power Participant

Hello @dead_DE

 

You can use local spark to access your OneLake storage, however you'd have to use a service principal. These are the prerequisites and the steps involved -

 

1. Make sure your Fabric tenant allows external apps

Your Fabric admin must enable:

  • Users can access data stored in OneLake with apps external to Fabric
  • (For SPs) Service principals can call Fabric public API
    Then grant your service principal Contributor (or above) to the Fabric workspace.

 

2. Collect the OneLake ABFSS path

OneLake uses ADLS Gen2‑compatible URIs. The account name is always onelake, and the filesystem is your workspace name (or GUID). Typical pattern:

 

abfss://<workspaceName or workspaceGUID>@onelake.dfs.fabric.microsoft.com/<lakehouseName or itemGUID>.lakehouse/Files/
 

3. Ensure your local Spark has the ABFS connector

You need Hadoop’s hadoop-azure (ABFS) + azure-storage bits on the classpath.

 

4. Create a Microsoft Entra service principal

Record Tenant (Directory) ID, Client ID (Application ID), and Client Secret, and grant the SP access to your Fabric workspace

 

5. Configure Spark for ABFS OAuth against OneLake host

 
pyspark \
  --conf "fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com=OAuth" \
  --conf "fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" \
  --conf "fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com=<APP_CLIENT_ID>" \
  --conf "fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com=<APP_CLIENT_SECRET>" \
  --conf "fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com=https://login.microsoftonline.com/<TENANT_ID>/oauth2/token"

 

Kindly accept this solution if it solves your problem. 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

thank you, i will look into getting a service principal from my organization and check back on this some time this upcoming week.

just a heads up i am still trying to get more than entra level access to the account to try this solution.
will update soon

Hi @dead_DE , Thanks for the update. Hope you will get the access soon. Please share the details here once you had a chance to try it.

Here’s where I’m currently stuck. I’ve been told that we need to create an Azure App Registration for programmatic access. However, when I create the App Registration, I don’t see any way to associate it directly with Fabric or OneLake. Because of that, I’m not sure how to ensure the app gets the correct permissions or how to generate the right client ID for Fabric access.

I also can’t just ask IT to give me broad permissions to all of Blob Storage and hope that covers wherever OneLake lives. Everything I’ve read suggests authenticating to Fabric programmatically by generating a token through the Azure CLI, but that’s what led me here since there’s no public Spark driver that supports that token flow yet.

Hello @dead_DE 

 

Here's how you can create a service principle or SPN. If there's a team who creates SPN for you in your organisation, they'd know this.

  • Go to Entra ID portal
    > Applications > App registrations > New registration
  • Note the:
    • Client ID (App ID)
    • Tenant ID
    • Client Secret 

 

A tenant setting must be changed for SPN to access a Fabric workspace. 

Tenant Settings > Developer Settings > “Service principals can use Fabric APIs”

 

Next you need to assign the SPN Contributor access to your workspace

In Fabric:

  1. Open the Workspace
  2. Select Manage access
  3. Click Add people or groups
  4. Search for your App Registration name (NOT the GUID!)
    • SPNs show up by the registered application name
  5. Assign an appropriate role:
    • Contributor (usually recommended)

Once both the steps are done, you can follow the message I posted on this thread earlier. By assigning the SPN Contributor access to the workspace allows it to write to OneLake. 

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

TY for this, I will check with IT on monday!

Following the guide above on creating a SPN i was able to get set up with a App cleint id and and app client secret.  

My container running spark needed some configuring as well.  I had to set it up with Delta Lake Jars and Azure Blob Storage configurations.

Full Example of how i got this to work

from pyspark.sql import SparkSession
# Create Spark session with Java 17 compatibility
spark = (SparkSession.builder
    .appName("OneLakeAccess")
    .master("local[*]")
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.4,io.delta:delta-spark_2.12:3.1.0")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .config("spark.hadoop.fs.abfss.impl", "org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem")
    .config("spark.hadoop.fs.abfs.impl", "org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem")
    .getOrCreate())

conf = spark.sparkContext._jsc.hadoopConfiguration()

conf.set("fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com", "OAuth")
conf.set("fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
conf.set("fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com", APP_CLIENT_ID)
conf.set("fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com", APP_CLIENT_SECRET)
conf.set("fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com", f"https://login.microsoftonline.com/{TENANT_ID}/oauth2/token")

df = spark.read.csv(<abfss_path_to_fabric_resource_in_lakehouse>)
df = spark.read.format("delta").load(<abfss_path_to_fabric_delta_table_in_lakehouse>)

 I am now able to connect to my fabric Lakehouse Files and Delta Tables with ABFSS paths from my spark container.

spencer_sa
Impactful Individual
Impactful Individual

I'm not sure about ABFSS paths, but I've successfully accessed data in OneLake using the ADLS dfs endpoints.
I tend to use SPN (rather than organization) credentials, but it works just fine.
I've connected to the SQL Endpoint too.
Having had a read around, this may be of use;
https://christianhenrikreich.medium.com/microsoft-fabric-diving-into-lakehouse-access-from-local-mac...

If this helps, please Accept as a Solution to help others find it more easily.

Helpful resources

Announcements
January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.