Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now

Reply
Richtpt
Helper I
Helper I

How to reference a JSON array element with no name using a Notebook or OPENROWSET?

I'm extremely new to notebooks and accessing data within JSON files that have been imported into a Lakehouse.  I have the following JSON schema in a file (from notebook df.printSchema()):

Richtpt_0-1773270458132.png

The Modifiers array element doesn't have a name, it just looks like this:

 

"Modifiers": [

       "US"

]

 

I can't figure out how to reference that using a notebook or SQL's OPENROWSET.  Any ideas?  Thanks!

1 ACCEPTED SOLUTION
v-nmadadi-msft
Community Support
Community Support

Hi @Richtpt  ,
Thanks for reaching out to the Microsoft Fabric Community forum.

If you just want to see the data in the same nested format as it appears in the JSON, you can run:

df.select("PRCCollection.Modifiers").show()

Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.

vnmadadimsft_0-1773308672555.png

 

If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:

from pyspark.sql.functions import explode



df2 = df.withColumn("prc", explode("PRCCollection")) \

        .withColumn("Modifier", explode("prc.Modifiers"))



df2.select("Modifier").show()

Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.

vnmadadimsft_1-1773308681197.png

 



For reference this is the output of df.schema() for my data

vnmadadimsft_2-1773308690662.png

 

 


I hope this information helps. Please do let us know if you have any further queries.
Thank you


View solution in original post

6 REPLIES 6
v-nmadadi-msft
Community Support
Community Support

Hi @Richtpt  ,
Thanks for reaching out to the Microsoft Fabric Community forum.

If you just want to see the data in the same nested format as it appears in the JSON, you can run:

df.select("PRCCollection.Modifiers").show()

Since both PRCCollection and Modifiers are arrays, Spark will display the result as something like [[US]]. That simply means there is an array inside another array.

vnmadadimsft_0-1773308672555.png

 

If you’d prefer to flatten the structure and extract the actual value, you can use explode() to expand each array level:

from pyspark.sql.functions import explode



df2 = df.withColumn("prc", explode("PRCCollection")) \

        .withColumn("Modifier", explode("prc.Modifiers"))



df2.select("Modifier").show()

Here, the first explode() opens up the PRCCollection array, and the second one expands the Modifiers array inside it. The output will then show the value directly (for example US) instead of the nested [[US]] structure.

vnmadadimsft_1-1773308681197.png

 



For reference this is the output of df.schema() for my data

vnmadadimsft_2-1773308690662.png

 

 


I hope this information helps. Please do let us know if you have any further queries.
Thank you


That code helped a lot, but I still wasn't getting what I wanted.  Probably because I'm extremely new to PySpark (as in this is my first time using it).  I finally asked Grok how to flatten this.  I got back several responses including this one that worked perfect.  Wanted to share to close this up.  Thanks much for the help!

 

from pyspark.sql.functions import col, explode_outer, explode
df = spark.read.json("Files/test.json")

df_deep = (
    df
    .select("*", explode_outer("PRC0Collection").alias("PR"))
    .select(
        "*",
        col("PR.*"),
        explode_outer(col("PR.Modifiers")).alias("Modifier")
    )
    .drop("PRC0Collection", "PR")
    .drop("PRC0Collection", "Modifiers")
)

display(df_deep)
 

btw, I forgot to post my json earlier.  Here it is:

{
    "AGAG_ID": "PHKETCHMC",
    "AGIP_CAP_IND": "",
    "CSPI_HIOS_ID_NVL": "73836AK09999",
    "CSPI_ID": "AMT01234",
    "GRGR_CK": 12345,
    "IPCD_ID": "R9999",
    "PRC0Collection": [
        {
            "Cache_ID": "1462da8b",
            "Exp_Date": "12/31/9999 00:00:00.000",
            "Modifiers": [
                "US"
            ],
            "PSCD_ID": "11",
            "Price": 16336.44,
            "Type": "negotiated"
        },
        {
            "Cache_ID": "156eab34",
            "Exp_Date": "12/31/9999 00:00:00.000",
            "Modifiers": [
                "UQ"
            ],
            "PSCD_ID": "11",
            "Price": 24499.76,
            "Type": "negotiated"
        }
    ],
    "PRPR_ID": "P00000099999",
    "SEDF_CAP_IND": "N"
}

 

 

Hi @Richtpt ,
We really appreciate your efforts and for letting us know the update on the issue.

Please continue using fabric community forum for your further assistance.
Thank you

 

Thanks, that helps a lot and I am able to see the Modifiers values.

trivedisunita
Advocate II
Advocate II

Modifier is an array, the value inside it is accessed by index number, not by name. firsy value is at index[0].

to get the value use code { df.select("PRC0Collection.Modifiers[0]").show() } 

May be this can help.

Thanks but this isn't working for me.  I have

 

df = spark.read.json("file1.json")

df.select("PRC0Collection.Modifiers[0]").show()

 

I get this error:  [FIELD_NOT_FOUND] No such struct field `Modifiers[0]` in `Cache_ID`, `Exp_Date`, `Modifiers`, `PSCD_ID`, `Price`, `Type`.

Helpful resources

Announcements
FabCon and SQLCon Highlights Carousel

FabCon & SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.