Don't miss your chance to take exam DP-600 or DP-700 on us!
Request nowFabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now
I'm extremely new to notebooks and accessing data within JSON files that have been imported into a Lakehouse. I have the following JSON schema in a file (from notebook df.printSchema()):
The Modifiers array element doesn't have a name, it just looks like this:
"Modifiers": [
"US"
]
I can't figure out how to reference that using a notebook or SQL's OPENROWSET. Any ideas? Thanks!
Solved! Go to Solution.
Hi @Richtpt ,
Thanks for reaching out to the Microsoft Fabric Community forum.
If you just want to see the data in the same nested format as it appears in the JSON, you can run:
df.select("PRCCollection.Modifiers").show()
Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.
If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:
from pyspark.sql.functions import explode
df2 = df.withColumn("prc", explode("PRCCollection")) \
.withColumn("Modifier", explode("prc.Modifiers"))
df2.select("Modifier").show()
Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.
For reference this is the output of df.schema() for my data
I hope this information helps. Please do let us know if you have any further queries.
Thank you
Hi @Richtpt ,
Thanks for reaching out to the Microsoft Fabric Community forum.
If you just want to see the data in the same nested format as it appears in the JSON, you can run:
df.select("PRCCollection.Modifiers").show()
Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.
If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:
from pyspark.sql.functions import explode
df2 = df.withColumn("prc", explode("PRCCollection")) \
.withColumn("Modifier", explode("prc.Modifiers"))
df2.select("Modifier").show()
Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.
For reference this is the output of df.schema() for my data
I hope this information helps. Please do let us know if you have any further queries.
Thank you
That code helped a lot, but I still wasn't getting what I wanted. Probably because I'm extremely new to PySpark (as in this is my first time using it). I finally asked Grok how to flatten this. I got back several responses including this one that worked perfect. Wanted to share to close this up. Thanks much for the help!
from pyspark.sql.functions import col, explode_outer, explode
df = spark.read.json("Files/test.json")
df_deep = (
df
.select("*", explode_outer("PRC0Collection").alias("PR"))
.select(
"*",
col("PR.*"),
explode_outer(col("PR.Modifiers")).alias("Modifier")
)
.drop("PRC0Collection", "PR")
.drop("PRC0Collection", "Modifiers")
)
display(df_deep)
btw, I forgot to post my json earlier. Here it is:
{
"AGAG_ID": "PHKETCHMC",
"AGIP_CAP_IND": "",
"CSPI_HIOS_ID_NVL": "73836AK09999",
"CSPI_ID": "AMT01234",
"GRGR_CK": 12345,
"IPCD_ID": "R9999",
"PRC0Collection": [
{
"Cache_ID": "1462da8b",
"Exp_Date": "12/31/9999 00:00:00.000",
"Modifiers": [
"US"
],
"PSCD_ID": "11",
"Price": 16336.44,
"Type": "negotiated"
},
{
"Cache_ID": "156eab34",
"Exp_Date": "12/31/9999 00:00:00.000",
"Modifiers": [
"UQ"
],
"PSCD_ID": "11",
"Price": 24499.76,
"Type": "negotiated"
}
],
"PRPR_ID": "P00000099999",
"SEDF_CAP_IND": "N"
}
Hi @Richtpt ,
We really appreciate your efforts and for letting us know the update on the issue.
Please continue using fabric community forum for your further assistance.
Thank you
Thanks, that helps a lot and I am able to see the Modifiers values.
Modifier is an array, the value inside it is accessed by index number, not by name. firsy value is at index[0].
to get the value use code { df.select("PRC0Collection.Modifiers[0]").show() }
May be this can help.
Thanks but this isn't working for me. I have
df = spark.read.json("file1.json")
df.select("PRC0Collection.Modifiers[0]").show()
I get this error: [FIELD_NOT_FOUND] No such struct field `Modifiers[0]` in `Cache_ID`, `Exp_Date`, `Modifiers`, `PSCD_ID`, `Price`, `Type`.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
Check out the February 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 23 | |
| 12 | |
| 10 | |
| 7 | |
| 7 |
| User | Count |
|---|---|
| 45 | |
| 40 | |
| 23 | |
| 15 | |
| 14 |