Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
rashidanwar
Advocate II
Advocate II

Unable to Read Nested Parquet Files in Azure Blob Storage with PySpark

Hi everyone,

I'm seeking assistance with reading nested parquet files stored in Azure Blob Storage using a PySpark Notebook in Microsoft Fabric. Here's a breakdown of the situation:

Challenge:

  • Power BI's Parquet.Document() function is unable to handle nested data structures in my parquet files.
  • PySpark throws an authentication error when attempting to access the files with the same SAS token used in Power BI.

Steps Taken:

  1. Power BI (Attempted):
    • Used an Azure Blob Storage SAS token for authentication.
    • Encountered limitations with Power BI for nested data structures.
  2. PySpark Notebook (Current Issue):
    • Configured the notebook with the same SAS token but received an authentication error: "No credentials found...".
    • Enabled anonymous access on the container (Security Risk) to bypass the error, but then encountered a "Path does not exist" error.

Code Snippet: 

blob_account_name = "account_name"
blob_container_name = "container"
blob_relative_path = "folder1/folder2/folder3/folder4/parquet/20240527/000001.parquet"
blob_sas_token = "sp=r&st=2024-05-19T01:51:21Z&se=2024-07-01T09:51:21Z&spr=https&sv=2022-11-02&sr=c&sig=%2CkbLTU8aY7maCu3ak15hjtVHr1jdhHgR2ZghfTTYBF%3D"


# Construct the path for connection
wasbs_path = f'wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/{blob_relative_path}?{blob_sas_token}'

# Read parquet data from Azure Blob Storage path
blob_df = spark.read.parquet(wasbs_path)

# Show the Azure Blob DataFrame
blob_df.show()
spark.stop()

 

Request:

I'd appreciate any insights on how to access and read the nested parquet files securely using PySpark without anonymous container access.

Additional Information:

  • The provided code snippet includes hypothetical credentials. Thanks in advance for your assistance!

Regards,

Rashid Anwar


Regards
Rashid Anwar

 





 

8 REPLIES 8
rashidanwar
Advocate II
Advocate II

@Anonymous, 

This issue is still unresolved, and I have opened a support ticket for it. Below is the tracking ID:

Tracking ID: 2405230050000012

I have successfully accessed the list of blobs stored in Azure Blob Storage and have posted the solution in the relevant thread, which can now be closed. Let's keep this thread open for further discussion.
Thank you!


 
 

Anonymous
Not applicable

Hi @rashidanwar ,

Thanks for sharing the support ticket.

Please allow some time, so team can check and provide a resolution.

In case if you got a resolution please do share with the community as it can be helpful to others .

rashidanwar
Advocate II
Advocate II

Now I want to have a list of all the blobs stored in the container. But list should contain the complete path to the blob.
For example I have Heirarchial File Structure in the Container.
There is a main folder called Entity and within Entity I have another folder called Apps and within Apps I have two folders 20240521 and 20240525 and within each folder I have a parquet file named 00001.parquet.

Now I want to have a list of the parquet files with comlete path.

How to get the list of files as follows using PySPark Notebook in Fabric
Entity/App/20240521/00001.parquet
Entity/App/20240525/00001.parquet

Anonymous
Not applicable
Anonymous
Not applicable

Hello @rashidanwar ,


We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Anonymous
Not applicable

Hi @rashidanwar ,

This thread is duplicate of this thread - How to get a list of the blob with full path store... - Microsoft Fabric Community

I am closing this thread.

rashidanwar
Advocate II
Advocate II

Hi I made some research and I have been able to get the content of all the blobs in the Azure Blob Storage Container using the following code. You can also filter ther results.

from pyspark.sql import SparkSession
from pyspark import SparkContext

blob_account_name = "parquetfiles1"
blob_container_name = "container1"
blob_sas_token = "sp=rl...."

# Initialize Spark Session
spark = SparkSession.builder.appName("azure").getOrCreate()

# Set the configuration
spark.conf.set(
f'spark.hadoop.fs.azure.sas.{blob_container_name}.{blob_account_name}.blob.core.windows.net',
blob_sas_token)

# Build the base path for the container (without specific file path)
base_path = f'wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net'

df = (
    spark.read.format("parquet")
    .option("recursiveFileLookup", "true")
    .load(base_path)
)

# df.show()
display(df)



Anonymous
Not applicable

Hi @rashidanwar ,

Thanks for using Fabric Community.
Can you please refer this similar thread - Solved: read Azure Data Lake from notebook fabric - Microsoft Fabric Community

Additional resources to refer -
How to read Parquet files in PySpark Azure Databricks? (azurelib.com)

Hope this might bring some insights. Please do let me know incase of further queries.

Helpful resources

Announcements
May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.