Solved: Re: How to reference a JSON array element with no ...

Richtpt · ‎03-11-2026

I'm extremely new to notebooks and accessing data within JSON files that have been imported into a Lakehouse. I have the following JSON schema in a file (from notebook df.printSchema()):

The Modifiers array element doesn't have a name, it just looks like this:

"Modifiers": [

"US"

]

I can't figure out how to reference that using a notebook or SQL's OPENROWSET. Any ideas? Thanks!

v-nmadadi-msft · ‎03-12-2026

Hi @Richtpt ,
Thanks for reaching out to the Microsoft Fabric Community forum.

If you just want to see the data in the same nested format as it appears in the JSON, you can run:

df.select("PRCCollection.Modifiers").show()

Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.

If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:

from pyspark.sql.functions import explode



df2 = df.withColumn("prc", explode("PRCCollection")) \

        .withColumn("Modifier", explode("prc.Modifiers"))



df2.select("Modifier").show()

Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.

For reference this is the output of df.schema() for my data

I hope this information helps. Please do let us know if you have any further queries.
Thank you

View solution in original post

v-nmadadi-msft · ‎03-12-2026

Hi @Richtpt ,
Thanks for reaching out to the Microsoft Fabric Community forum.

If you just want to see the data in the same nested format as it appears in the JSON, you can run:

df.select("PRCCollection.Modifiers").show()

Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.

If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:

from pyspark.sql.functions import explode



df2 = df.withColumn("prc", explode("PRCCollection")) \

        .withColumn("Modifier", explode("prc.Modifiers"))



df2.select("Modifier").show()

Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.

For reference this is the output of df.schema() for my data

I hope this information helps. Please do let us know if you have any further queries.
Thank you

Richtpt · ‎03-12-2026

That code helped a lot, but I still wasn't getting what I wanted. Probably because I'm extremely new to PySpark (as in this is my first time using it). I finally asked Grok how to flatten this. I got back several responses including this one that worked perfect. Wanted to share to close this up. Thanks much for the help!

from pyspark.sql.functions import col, explode_outer, explode
df = spark.read.json("Files/test.json")

df_deep = (
    df
    .select("*", explode_outer("PRC0Collection").alias("PR"))
    .select(
        "*",
        col("PR.*"),
        explode_outer(col("PR.Modifiers")).alias("Modifier")
    )
    .drop("PRC0Collection", "PR")
    .drop("PRC0Collection", "Modifiers")
)

display(df_deep)

btw, I forgot to post my json earlier. Here it is:

{
    "AGAG_ID": "PHKETCHMC",
    "AGIP_CAP_IND": "",
    "CSPI_HIOS_ID_NVL": "73836AK09999",
    "CSPI_ID": "AMT01234",
    "GRGR_CK": 12345,
    "IPCD_ID": "R9999",
    "PRC0Collection": [
        {
            "Cache_ID": "1462da8b",
            "Exp_Date": "12/31/9999 00:00:00.000",
            "Modifiers": [
                "US"
            ],
            "PSCD_ID": "11",
            "Price": 16336.44,
            "Type": "negotiated"
        },
        {
            "Cache_ID": "156eab34",
            "Exp_Date": "12/31/9999 00:00:00.000",
            "Modifiers": [
                "UQ"
            ],
            "PSCD_ID": "11",
            "Price": 24499.76,
            "Type": "negotiated"
        }
    ],
    "PRPR_ID": "P00000099999",
    "SEDF_CAP_IND": "N"
}

v-nmadadi-msft · ‎03-12-2026

Hi @Richtpt ,
We really appreciate your efforts and for letting us know the update on the issue.

Please continue using fabric community forum for your further assistance.
Thank you

Richtpt · ‎03-12-2026

Thanks, that helps a lot and I am able to see the Modifiers values.

trivedisunita · ‎03-11-2026

Modifier is an array, the value inside it is accessed by index number, not by name. firsy value is at index[0].

to get the value use code { df.select("PRC0Collection.Modifiers[0]").show() }

May be this can help.

Richtpt · ‎03-12-2026

Thanks but this isn't working for me. I have

df = spark.read.json("file1.json")

df.select("PRC0Collection.Modifiers[0]").show()

I get this error: [FIELD_NOT_FOUND] No such struct field `Modifiers[0]` in `Cache_ID`, `Exp_Date`, `Modifiers`, `PSCD_ID`, `Price`, `Type`.

How to reference a JSON array element with no name using a Notebook or OPENROWSET?

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Get Fabric or SQL Certified for Free.

How to reference a JSON array element with no name using a Notebook or OPENROWSET?

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026