Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
ToddChitt
Super User
Super User

Notebook to read complex JSON array

Hello all. I am trying to learn PySpark from this website: 

Good info, but I am stuck. I borrowed the simple JSON code that looks like this:
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
]
 
And I can read that in a data frame. But unfortunately, my data has an array name at the top, like this:
{ "data": [
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
] }
 
I read these two items into 2 data frames, then do two selects in PySpark:
dfJSON1 = df1.select( col("RecordNumber"), col("Zipcode"))                        
dfJSON2 = df2.select( col("data.RecordNumber"), col("data.Zipcode"))    

dfJSON1.show()
dfJSON2.show()
 
The two results:
ToddChitt_3-1712587186300.png

 

What am I missing to get the second data frame to show two records, similar to the first? 

 

This can't be that hard. What am I missing?


 Thanks in advance.


 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





1 ACCEPTED SOLUTION
Expiscornovus
Super User
Super User

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

View solution in original post

3 REPLIES 3
ToddChitt
Super User
Super User

@Expiscornovus Thanks for the quick response.

Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.

I will check out that blog and try to learn a little more about PySpark.

Thanks




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Anonymous
Not applicable

Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.

Expiscornovus
Super User
Super User

Hi @ToddChitt,

 

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

 

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

 

Below is an example based on your json code in one of my test notebooks.

 

 

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode

exploded_df = df.select(explode(col("data")).alias("data"))

tf_df = exploded_df.select(

    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
)

display(tf_df)

dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

 

 

sqlfunction_pyspark.png



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.