Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
Sign up nowGet Fabric certified for FREE! Don't miss your chance! Learn more
Hello all. I am trying to learn PySpark from this website:
What am I missing to get the second data frame to show two records, similar to the first?
This can't be that hard. What am I missing?
Thanks in advance.
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
@Expiscornovus Thanks for the quick response.
Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.
I will check out that blog and try to learn a little more about PySpark.
Thanks
Proud to be a Super User! | |
Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
If you love stickers, then you will definitely want to check out our Community Sticker Challenge!
Check out the January 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 22 | |
| 14 | |
| 9 | |
| 8 | |
| 7 |
| User | Count |
|---|---|
| 78 | |
| 67 | |
| 48 | |
| 22 | |
| 18 |