Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi everyone,
I'm seeking assistance with reading nested parquet files stored in Azure Blob Storage using a PySpark Notebook in Microsoft Fabric. Here's a breakdown of the situation:
Challenge:
Steps Taken:
Code Snippet:
blob_account_name = "account_name"
blob_container_name = "container"
blob_relative_path = "folder1/folder2/folder3/folder4/parquet/20240527/000001.parquet"
blob_sas_token = "sp=r&st=2024-05-19T01:51:21Z&se=2024-07-01T09:51:21Z&spr=https&sv=2022-11-02&sr=c&sig=%2CkbLTU8aY7maCu3ak15hjtVHr1jdhHgR2ZghfTTYBF%3D"
# Construct the path for connection
wasbs_path = f'wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/{blob_relative_path}?{blob_sas_token}'
# Read parquet data from Azure Blob Storage path
blob_df = spark.read.parquet(wasbs_path)
# Show the Azure Blob DataFrame
blob_df.show()
spark.stop()
Request:
I'd appreciate any insights on how to access and read the nested parquet files securely using PySpark without anonymous container access.
Additional Information:
Regards,
Rashid Anwar
Regards
Rashid Anwar
@Anonymous,
This issue is still unresolved, and I have opened a support ticket for it. Below is the tracking ID:
Tracking ID: 2405230050000012
I have successfully accessed the list of blobs stored in Azure Blob Storage and have posted the solution in the relevant thread, which can now be closed. Let's keep this thread open for further discussion.
Thank you!
Hi @rashidanwar ,
Thanks for sharing the support ticket.
Please allow some time, so team can check and provide a resolution.
In case if you got a resolution please do share with the community as it can be helpful to others .
Now I want to have a list of all the blobs stored in the container. But list should contain the complete path to the blob.
For example I have Heirarchial File Structure in the Container.
There is a main folder called Entity and within Entity I have another folder called Apps and within Apps I have two folders 20240521 and 20240525 and within each folder I have a parquet file named 00001.parquet.
Now I want to have a list of the parquet files with comlete path.
How to get the list of files as follows using PySPark Notebook in Fabric
Entity/App/20240521/00001.parquet
Entity/App/20240525/00001.parquet
Hi @rashidanwar ,
Can you please check below doc ?
Introduction to Microsoft Spark utilities - Azure Synapse Analytics | Microsoft Learn
Hello @rashidanwar ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Hi @rashidanwar ,
This thread is duplicate of this thread - How to get a list of the blob with full path store... - Microsoft Fabric Community
I am closing this thread.
Hi I made some research and I have been able to get the content of all the blobs in the Azure Blob Storage Container using the following code. You can also filter ther results.
Hi @rashidanwar ,
Thanks for using Fabric Community.
Can you please refer this similar thread - Solved: read Azure Data Lake from notebook fabric - Microsoft Fabric Community
Additional resources to refer -
How to read Parquet files in PySpark Azure Databricks? (azurelib.com)
Hope this might bring some insights. Please do let me know incase of further queries.
User | Count |
---|---|
13 | |
4 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
8 | |
7 | |
6 | |
5 |