Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
Hello all. I am trying to learn PySpark from this website:
What am I missing to get the second data frame to show two records, similar to the first?
This can't be that hard. What am I missing?
Thanks in advance.
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
@Expiscornovus Thanks for the quick response.
Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.
I will check out that blog and try to learn a little more about PySpark.
Thanks
Proud to be a Super User! | |
Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 13 | |
| 8 | |
| 7 | |
| 5 | |
| 3 |