This is best Fabric, Power BI, SQL and AI community event. How do we know? The last event sold out! Save €200 with code FABCMTY200.
Register nowA new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.
Hello all. I am trying to learn PySpark from this website:
What am I missing to get the second data frame to show two records, similar to the first?
This can't be that hard. What am I missing?
Thanks in advance.
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
@Expiscornovus Thanks for the quick response.
Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.
I will check out that blog and try to learn a little more about PySpark.
Thanks
Proud to be a Super User! | |
Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
Check out the April 2026 Fabric update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
| User | Count |
|---|---|
| 13 | |
| 11 | |
| 5 | |
| 5 | |
| 5 |
| User | Count |
|---|---|
| 26 | |
| 20 | |
| 14 | |
| 10 | |
| 10 |