Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

60 Days of Data Days! Live and on-demand sessions, challenges, study groups and more! And it's all FREE!. Join now. Learn more

Reply
ToddChitt
Super User
Super User

Notebook to read complex JSON array

Hello all. I am trying to learn PySpark from this website: 

Good info, but I am stuck. I borrowed the simple JSON code that looks like this:
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
]
 
And I can read that in a data frame. But unfortunately, my data has an array name at the top, like this:
{ "data": [
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
] }
 
I read these two items into 2 data frames, then do two selects in PySpark:
dfJSON1 = df1.select( col("RecordNumber"), col("Zipcode"))                        
dfJSON2 = df2.select( col("data.RecordNumber"), col("data.Zipcode"))    

dfJSON1.show()
dfJSON2.show()
 
The two results:
ToddChitt_3-1712587186300.png

 

What am I missing to get the second data frame to show two records, similar to the first? 

 

This can't be that hard. What am I missing?


 Thanks in advance.


 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





1 ACCEPTED SOLUTION
Expiscornovus
Most Valuable Professional
Most Valuable Professional

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

View solution in original post

3 REPLIES 3
ToddChitt
Super User
Super User

@Expiscornovus Thanks for the quick response.

Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.

I will check out that blog and try to learn a little more about PySpark.

Thanks




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Anonymous
Not applicable

Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.

Expiscornovus
Most Valuable Professional
Most Valuable Professional

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

Helpful resources

Announcements
FabCon and SQLCon Barcelona 2026

FabCon & SQLCon – Barcelona 2026

Join us in Barcelona for FabCon and SQLCon, the Fabric, Power BI, SQL, and AI community event. Save €200 with code FABCMTY200.

60 days of Data Days Carousel

Data Days 2026

Join Fabric Data Days 2026: 60 days of free live/on-demand sessions, challenges, study groups, and certification opportunities.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.