Re: JSON parsing

kirah2128 · ‎02-02-2024

Dear Everyone,

"{'manufacturerCodeValue': 'XXXX', 'specific': '1', 'toolNumber': 'SCSTU'}, {'manufacturerCodeValue': 'XXX', 'specific': '1', 'toolNumber': 'SCST1AK35U'}, {'specific': '0', 'toolNumber': 'No Specific'}")

I kindly need your help, I'm not good at JSON formatting and I tried a lot of options. I have this kind of data in a data frame and I want only to extract the value of the toolNumber which is SCSTU and SCST1AK35U.

My expected output is SCSTU, SCST1AK35U

The code below only captures SCSTU.

result_df = result_df.withColumn(
    "SUPP_PART_REQ_PN_2",
    expr(
        "concat("
        "get_json_object(SUPP_PART_REQ_PN, '$.toolNumber')"
        ")"
    )
)

Thank you so much for your help in advance.

dphugo · ‎02-02-2024

Hi @kirah2128,

I must confess, I leaned heavily on ChatGPT, as I'm not an expert in JSON or PySpark.

The JSON string has a dangling ")" at the end, and you refer to an existing DataFrame, so I made a few assumptions:

1) The ")" isn't part of the JSON string.

2) The remaining quotes around the JSON string is only to specify that it is a string.

If that's the case, then it might be invalid JSON. Maybe paste the JSON string you have here [https://jsonlint.com/] for vaidation check.

However, herewith the PySpark code where I've made the JSON string an array object, with some other assumptions. There might be a more elegant way of parsing the string to your requirement, but hopefully this gives you some ideas of how to adjust your code.

I've also pasted your JSON string in a comment, and running that through only returns 1 value: "SCSTU"

from pyspark.sql.functions import col, explode, expr

# Input data
#gwason = '{"manufacturerCodeValue": "XXXX", "specific": "1", "toolNumber": "SCSTU"}, {"manufacturerCodeValue": "XXX", "specific": "1", "toolNumber": "SCST1AK35U"}, {"specific": "0", "toolNumber": "No Specific"}'
gwason = '[{"manufacturerCodeValue": "XXXX", "specific": "1", "toolNumber": "SCSTU"}, {"manufacturerCodeValue": "XXX", "specific": "1", "toolNumber": "SCST1AK35U"}, {"specific": "0", "toolNumber": "No Specific"}]'

# Create DataFrame
data = [(1, gwason)]
df = spark.createDataFrame(data, ["id", "gwason"])
#display(df)

# Create array of structs directly, as the input for 'explode' needs to be an array or map data type
df_parsed = df.withColumn("parsed", expr("from_json(gwason, 'array<struct<manufacturerCodeValue:string,specific:string,toolNumber:string>>')"))
#display(df_parsed)

# Explode the array of structs
df_exploded = df_parsed.select(explode("parsed").alias("exploded"))
#display(df_exploded)

# Filter and collect toolNumber values
result_df = df_exploded.filter(col("exploded.toolNumber") != "No Specific").select("exploded.toolNumber")
display(result_df)

# Convert DataFrame to list and print
result_list = result_df.rdd.map(lambda row: row[0]).collect()
print(str(result_list))

kirah2128 · ‎02-03-2024

Dear dphugo,

It's a great help 🙂 Thank you so much. If you don't mind, I'm having a problem collecting it back to my dataframe. I got [Object Object] as a result

df_parsed = result_df.withColumn("parsed", expr("from_json(SUPP_PART_REQ_PN, 'array<struct<manufacturerCodeValue:string,specific:string,toolNumber:string>>')"))
df_exploded = df_parsed.select("FileName", F.explode("parsed").alias("exploded_value"))
result_df = df_exploded.groupBy("FileName").agg(collect_list("exploded_value").alias("SUPP_PART_REQ_PN3"))

I'm using a display() to see the results.

Anonymous · ‎02-04-2024

Hi @kirah2128 ,

Thanks for using Fabric Community.

I understand that you're encountering an issue when trying to collect the data back into your DataFrame and seeing [Object Object] in your display. Here are a few things to check:

1. Data size:

The collect() method in Spark is used to retrieve all elements of a DataFrame to the driver node. Try using df.collect(). This can be problematic for large datasets as it might not fit in memory, leading to errors.

2. Data format:

You mentioned using display() to see the results. This function might not display complex data structures like DataFrames properly, leading to the [Object Object] output. Try using print(result_df.toPandas()) or result_df.show() to view the DataFrame contents in a more structured format.

3. Specific error message:

If you're encountering an error beyond just [Object Object], please provide the exact error message. This will help pinpoint the specific issue and suggest more tailored solutions.

Hope this is helpful. Please let me know incase of further queries.

Anonymous · ‎02-06-2024

Hello @kirah2128 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Anonymous · ‎02-07-2024

Hello @kirah2128 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .

JSON parsing

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

JSON parsing

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025