Solved: Write to Fabric OneLake from a Synapse Spark noteb...

Krumelur · ‎05-30-2025

I'm looking for ways to access a Fabric Lakehouse from a Synapse workspace (the standalone Synapse).

I can successfully use a Copy Activity + Lakehouse Linkedservice, and service principal + certificate for auth, as described here to write data from my Synapse workspace into a Fabric Lakehouse.

Now I would to use a Spark notebook to achieve the same. I am already authenticating to a Gen2 storage account using code like this:

spark.conf.set(f"spark.storage.synapse.{base_storage_url}.linkedServiceName", linked_service)

sc._jsc.hadoopConfiguration().set(f"fs.azure.account.oauth.provider.type.{base_storage_url}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")

baseUrl is in the format of containername@storagename.dfs.core.windows.net

I was hoping this would also work with Fabric's OneLake as it also exposes and abfss:// endpoint, but no luck.

Is it possible?

v-sgandrathi · ‎06-02-2025

Hi @Krumelur,

Although an access token can be successfully acquired using a certificate and service principal (via MSAL and SNI), writing data from Synapse Spark to Microsoft Fabric OneLake using the abfss:// protocol is currently not supported. The 404 error encountered in this scenario stems from the inability of the ABFS driver within Synapse to resolve OneLake’s internal namespace, despite successful authentication.

Microsoft Fabric’s OneLake implements a virtualized filesystem abstraction that is compatible only with Microsoft-native tools such as Fabric Spark notebooks, Dataflows, and Pipelines. External Spark engines, including Synapse Spark, are not capable of interpreting OneLake paths correctly. Consequently, token-based configurations fail at the filesystem resolution stage rather than during authentication.

As of June 2025, Microsoft has not introduced support for direct data writes from Synapse Spark to OneLake. The recommended approach is to first write the data to Azure Data Lake Storage Gen2 (ADLS Gen2), and subsequently transfer it to the Lakehouse using either a Synapse pipeline Copy Activity or a Fabric Dataflow.

Hope my suggestion gives you good idea, if you have any more questions, please feel free to ask we are here to help you.
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you.

View solution in original post

Krumelur · ‎06-01-2025

Thanks for getting back to me. I'm not fully convinced yet 🙂

Copying data after writing it is not an option.

With some custom code I was able to obtain an access token from OneLake using SNI. I then passed the token to Spark. I'm now stuck with OneLake reporting a 404, stating the path I want to access would not exist, while it does.

Notice that the approach below is not using linked services to obtain an access token.

* Is there a way to make this work or am I hitting a wall?

* When will Spark be supported in Synapse?

# Goal of this code: Acquire an access token using SNI for a client ID and a cert stored in a KeyVault.
# The Synapse workspace managed ID has access to the KV and can read the cert.

from cryptography.hazmat.primitives.serialization import pkcs12, Encoding, PrivateFormat, NoEncryption
from cryptography.hazmat.primitives import hashes
import base64
import msal
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType


# Load PFX from Key Vault
# Tested: Client has access to workspace (is admin) and using the client and cert successfully works
# when doing a copy job in Synapse.
cert_pfx_base64 = mssparkutils.credentials.getSecretWithLS("My_KeyVault_LinkedService", "MyCertificateName")
cert_pfx_bytes = base64.b64decode(cert_pfx_base64)

# Extract components
private_key, certificate, _ = pkcs12.load_key_and_certificates(cert_pfx_bytes, b"")

private_key_pem = private_key.private_bytes(
  encoding=Encoding.PEM,
  format=PrivateFormat.PKCS8,
  encryption_algorithm=NoEncryption()
).decode()

cert_pem = certificate.public_bytes(Encoding.PEM).decode()
thumbprint = certificate.fingerprint(hashes.SHA1()).hex()

# MSAL setup
tenant_id = "TENANT_ID_WHERE_FABRIC_WORKSPACE_LIVES"
client_id = "CLIENT_ID_USING_SNI"

app = msal.ConfidentialClientApplication(
  client_id=client_id,
  authority=f"https://login.microsoftonline.com/{tenant_id}",
  client_credential={
    "private_key": private_key_pem,
    "thumbprint": thumbprint,
    "public_certificate": cert_pem # <- this is what triggers SN/I
  }
)

# Acquire token
result = app.acquire_token_for_client(scopes=["https://storage.azure.com/.default"])
access_token = result["access_token"]
# Checking the token: it's valid.
print(access_token[:50])

# PASS TOKEN TO SPARK

# The URL is copied from Fabric UI's properties of the folder in the Lakehouse
full_url = "abfss://4f.....@msit-onelake.dfs.fabric.microsoft.com/f....../Files"
base_url = "4f.......@msit-onelake.dfs.fabric.microsoft.com"

spark.conf.set(f"fs.azure.account.auth.type.{base_url}", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{base_url}", "org.apache.hadoop.fs.azurebfs.oauth2.AccessTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.access.token.{base_url}", access_token)

# CREATE TEST DATA

schema = StructType([
  StructField("id", IntegerType(), False),
  StructField("name", StringType(), False),
  StructField("value", IntegerType(), False)
])

data = [
  (1, "Alpha", 100),
  (2, "Beta", 200),
  (3, "Gamma", 300)
]

df = spark.createDataFrame(data, schema)

# SAVE TEST DATA

# Returns a 404 as if path would not exists, but not true. I'm also getting this if I do not pass a token at all... 😞
df.write.format("delta").mode("overwrite").save(full_url)

# Error
# An error occurred while calling o4263.save.
# : java.io.FileNotFoundException: Operation failed: "NotFound", 404, HEAD, https://msit-onelake.dfs.fabric.microsoft.com/4......../?upn=false&action=getAccessControl&timeout=90
# at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1436)
# at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.mkdirs(AzureBlobFileSystem.java:609)
# at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2388)
# at org.apache.spark.sql.delta.DeltaLog.createLogDirectory(DeltaLog.scala:467)
# at org.apache.spark.sql.delta.commands.WriteIntoDelta.write(WriteIntoDelta.scala:265)

v-sgandrathi · ‎06-02-2025

Hi @Krumelur,

Although an access token can be successfully acquired using a certificate and service principal (via MSAL and SNI), writing data from Synapse Spark to Microsoft Fabric OneLake using the abfss:// protocol is currently not supported. The 404 error encountered in this scenario stems from the inability of the ABFS driver within Synapse to resolve OneLake’s internal namespace, despite successful authentication.

Microsoft Fabric’s OneLake implements a virtualized filesystem abstraction that is compatible only with Microsoft-native tools such as Fabric Spark notebooks, Dataflows, and Pipelines. External Spark engines, including Synapse Spark, are not capable of interpreting OneLake paths correctly. Consequently, token-based configurations fail at the filesystem resolution stage rather than during authentication.

As of June 2025, Microsoft has not introduced support for direct data writes from Synapse Spark to OneLake. The recommended approach is to first write the data to Azure Data Lake Storage Gen2 (ADLS Gen2), and subsequently transfer it to the Lakehouse using either a Synapse pipeline Copy Activity or a Fabric Dataflow.

Hope my suggestion gives you good idea, if you have any more questions, please feel free to ask we are here to help you.
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you.

v-sgandrathi · ‎06-01-2025

Hi @Krumelur,

Thank you for reaching out to the community with your question.

At this time, writing directly from a standalone Azure Synapse Spark notebook to Microsoft Fabric OneLake (Lakehouse) using the abfss:// endpoint is not supported. While OneLake uses a similar URI format to ADLS Gen2, its authentication model is different, and tokens issued through Synapse-linked services are not recognized by Fabric. This is why the approach that works for ADLS Gen2 does not apply to OneLake.

As a supported and reliable workaround, we recommend first writing your data from Synapse Spark to a staging location in Azure Data Lake Storage Gen2. From there, you can use a Synapse pipeline with a Copy Activity to move the data into your Fabric Lakehouse. This pipeline should use a Lakehouse Linked Service configured with a service principal and certificate for secure access. This method ensures compatibility and follows Microsoft’s best practices for integrating Synapse with Fabric.

We truly appreciate your engagement in the forum and encourage you to continue sharing your experiences and questions. Your contributions help strengthen the community for everyone.

Hope my suggestion gives you good idea, if you have any more questions, please feel free to ask we are here to help you.
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Regards,

Sahasra

Community Support Team.

Write to Fabric OneLake from a Synapse Spark notebook

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Write to Fabric OneLake from a Synapse Spark notebook

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025