Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello all. I am trying to learn PySpark from this website:
What am I missing to get the second data frame to show two records, similar to the first?
This can't be that hard. What am I missing?
Thanks in advance.
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
@Expiscornovus Thanks for the quick response.
Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.
I will check out that blog and try to learn a little more about PySpark.
Thanks
Proud to be a Super User! | |
Glad to know you got some insights over your query. Please continue using Fabric Community on your further queries.
Hi @ToddChitt,
Wouldn't it be possible to use a couple of SQL functions like explode and col for this?
I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...
Below is an example based on your json code in one of my test notebooks.
# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
col("data.RecordNumber").alias("RecordNumber"),
col("data.Zipcode").alias("Zipcode")
)
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))
dfJSON1.show()
User | Count |
---|---|
24 | |
15 | |
5 | |
5 | |
2 |
User | Count |
---|---|
48 | |
41 | |
18 | |
7 | |
5 |