topic Re: Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage con in Data Engineering

Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage contain

sruthiramana — Wed, 25 Sep 2024 19:08:43 GMT

Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage container in a Azure Blob. So that I can automate this process using pipeline.

1) For a Lakehouse, and I created a shortcut to access azure blob storage. (Successfully done)

2) I can see the excel file in the Files folder.

3) Now trying to access the sheets using the ABFSS path but it fails to recognize the file.

# Define the ABFSS path to the Excel file

excel_file_path ="abfss_path/TestFile.xlsx"

# Step 1: Read the Excel file using Pandas

# You might need to specify engine='openpyxl' depending on the Excel format

excel_data = pd.read_excel(excel_file_path, sheet_name=None)

# Step 2: Initialize a Spark session if not already available

spark = SparkSession.builder.getOrCreate()

# Step 3: Loop through each sheet and convert it to a Spark DataFrame

for sheet_name, sheet_df in excel_data.items():

# Convert Pandas DataFrame to Spark DataFrame

spark_df = spark.createDataFrame(sheet_df)

# Perform any actions on the Spark DataFrame

spark df.show()

4) This throws the error

Py4JJavaError: An error occurred while calling . : java.nio.file.AccessDeniedException: Operation failed: "Server failed to authenticate the request. Please refer to the information in the www-authenticate header."

Re: Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage con

FabianSchut — Wed, 25 Sep 2024 21:04:28 GMT

Hi @sruthiramana,

What is the abfss path you are using? You can replace the actual id-s with xxx, but I want to check if the right folders are selected. Furthermore, you don't need to start a Spark session within a Fabric notebook. Could you remove step 2? They may be a permission error there, since it is another session. A bit guessing here, but that session is not needed.

Re: Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage con

Anonymous — Thu, 26 Sep 2024 07:01:53 GMT

HI @sruthiramana,

The error message mention you can't directly access them without credentials, perhaps you can try to add authorization steps during load data from storage account.

For example:

from notebookutils import mssparkutils # service principal tenant_id = "<tenant_id>" client_id = "<client_id>" client_secret = mssparkutils.credentials.getSecret("https://YourKeyVault.vault.azure.net/","your-client-secret-secret-name") # Azure storage detail storage_account_name = "<account>" container_name = "<container_name>" file_path= "<file_path>" # Spark configuration spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id", client_id) spark.conf.set("fs.azure.account.oauth2.client.secret", client_secret) spark.conf.set("fs.azure.account.oauth2.client.endpoint", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token") # Full file path data_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/{file_path}" # Read the data into a Spark DataFrame df = spark.read.format("csv").option("header", "true").load(data_path) # show the result df.show()

Regards,

Xiaoxin Sheng

Re: Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage con

sruthiramana — Thu, 26 Sep 2024 18:46:19 GMT

Thanks for your help, it was a mistake at my end. I was looking for a wrong file name.!

Re: Hello everyone! I am trying to access to excel files (with multiple tabs) that is in storage con

Anonymous — Fri, 27 Sep 2024 06:21:08 GMT

HI @sruthiramana,

I'm glad here you find the root causing and share it here.

Regards,

Xiaoxin Sheng