Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
ToddChitt
Super User
Super User

Notebook to read complex JSON array

Hello all. I am trying to learn PySpark from this website: 

Good info, but I am stuck. I borrowed the simple JSON code that looks like this:
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
]
 
And I can read that in a data frame. But unfortunately, my data has an array name at the top, like this:
{ "data": [
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
] }
 
I read these two items into 2 data frames, then do two selects in PySpark:
dfJSON1 = df1.select( col("RecordNumber"), col("Zipcode"))                        
dfJSON2 = df2.select( col("data.RecordNumber"), col("data.Zipcode"))    

dfJSON1.show()
dfJSON2.show()
 
The two results:
ToddChitt_3-1712587186300.png

 

What am I missing to get the second data frame to show two records, similar to the first? 

 

This can't be that hard. What am I missing?


 Thanks in advance.


 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





1 ACCEPTED SOLUTION
Expiscornovus
Super User
Super User

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

View solution in original post

3 REPLIES 3
ToddChitt
Super User
Super User

@Expiscornovus Thanks for the quick response.

Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.

I will check out that blog and try to learn a little more about PySpark.

Thanks




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Anonymous
Not applicable

Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.

Expiscornovus
Super User
Super User

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.