Solved: Authentication to read data from Lakehouse using A...

antoniofarias · ‎05-14-2024

I'm trying to read data from lakehouse using a python code, is there any documentation of authentication to use and help me, because I did't found any one.

I already created an app registration conected into fabric and have client id, secret value...

I know how to write and read with a abfs path but I don't know how to authenticate into.

I already found something using the Azure Databricks, but at this time I need to do this without use azure.

antoniofarias · ‎05-16-2024

Hi @v-nikhilan-msft, I'm using vscode + spark(docker) to run the code. I found the problem, it was my firewall blocking, so here is the code to read a file outside a Microsoft service:

from pyspark.sql import SparkSession

spark = (SparkSession.builder.config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1")
         .appName("Fabric").getOrCreate())

spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", service_principal_id)
spark.conf.set("fs.azure.account.oauth2.client.secret", service_principal_password)
spark.conf.set("fs.azure.account.oauth2.client.endpoint", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.files.ignoreCorruptFiles", "true")

table_name = ''

df = spark.read.format("parquet").load(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables/{table_name}/*.parquet")

df.show()

View solution in original post

v-nikhilan-msft · ‎05-15-2024

Hi @antoniofarias
Thanks for using Fabric Community.
You can refer to this doc : Connect to Fabric Lakehouses & Warehouses from Python code - Sam Debruyn

Hope this helps. Please let me know if you have any further questions.

antoniofarias · ‎05-15-2024

Hi @v-nikhilan-msft, thanks for your reply, I used this code yesterday and it worked in Databricks but didn't in vscode

Some errors I got:

v-nikhilan-msft · ‎05-16-2024

Hi @antoniofarias
Are you running the above code in Databricks or Fabric? Can you please provide me these details

Thanks

antoniofarias · ‎05-16-2024

Hi @v-nikhilan-msft, I'm using vscode + spark(docker) to run the code. I found the problem, it was my firewall blocking, so here is the code to read a file outside a Microsoft service:

from pyspark.sql import SparkSession

spark = (SparkSession.builder.config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.3.1")
         .appName("Fabric").getOrCreate())

spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", service_principal_id)
spark.conf.set("fs.azure.account.oauth2.client.secret", service_principal_password)
spark.conf.set("fs.azure.account.oauth2.client.endpoint", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.files.ignoreCorruptFiles", "true")

table_name = ''

df = spark.read.format("parquet").load(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables/{table_name}/*.parquet")

df.show()

Authentication to read data from Lakehouse using ABFS path and python/spark code

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

New Offer! Become a Certified Fabric Data Engineer

Authentication to read data from Lakehouse using ABFS path and python/spark code

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025