<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: JSON parsing in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3682544#M335</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/643638"&gt;@kirah2128&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .&lt;BR /&gt;In case if you have any resolution please do share that same with the community as it can be helpful to others . &lt;BR /&gt;Otherwise, will respond back with the more details and we will try to help .&lt;/P&gt;</description>
    <pubDate>Tue, 06 Feb 2024 09:24:22 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-02-06T09:24:22Z</dc:date>
    <item>
      <title>JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3676056#M331</link>
      <description>&lt;P&gt;Dear Everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;"{'manufacturerCodeValue': 'XXXX', 'specific': '1', 'toolNumber': 'SCSTU'}, {'manufacturerCodeValue': 'XXX', 'specific': '1', 'toolNumber': 'SCST1AK35U'}, {'specific': '0', 'toolNumber': 'No Specific'}")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I kindly need your help, I'm not good at JSON formatting and I tried a lot of options. I have this kind of data in a data frame and I want only to extract the value of the toolNumber which is&amp;nbsp;&lt;SPAN class=""&gt;SCSTU and&amp;nbsp;SCST1AK35U.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;My expected output is&amp;nbsp;SCSTU,&amp;nbsp;SCST1AK35U&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;BR /&gt;The code below only captures SCSTU.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;result_df = result_df.withColumn(
    "SUPP_PART_REQ_PN_2",
    expr(
        "concat("
        "get_json_object(SUPP_PART_REQ_PN, '$.toolNumber')"
        ")"
    )
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you so much for your help in advance.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 16:34:04 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3676056#M331</guid>
      <dc:creator>kirah2128</dc:creator>
      <dc:date>2024-02-02T16:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3676422#M332</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/643638"&gt;@kirah2128&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I must confess, I leaned heavily on ChatGPT, as I'm not an expert in JSON or PySpark.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The JSON string has a dangling ")" at the end, and you refer to an existing DataFrame, so I made a few assumptions:&lt;/P&gt;&lt;P&gt;1) The ")" isn't part of the JSON string.&lt;/P&gt;&lt;P&gt;2) The remaining quotes&amp;nbsp; around the JSON string is only to specify that it is a string.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If that's the case, then it might be invalid JSON. Maybe paste the JSON string you have here [&lt;A href="https://jsonlint.com/" target="_blank"&gt;https://jsonlint.com/&lt;/A&gt;] for vaidation check.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, herewith the PySpark code where I've made the JSON string an array object, with some other assumptions. There might be a more elegant way of parsing the string to your requirement, but hopefully this gives you some ideas of how to adjust your code.&lt;/P&gt;&lt;P&gt;I've also pasted your JSON string in a comment, and running that through only returns 1 value: "&lt;SPAN&gt;SCSTU"&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import col, explode, expr

# Input data
#gwason = '{"manufacturerCodeValue": "XXXX", "specific": "1", "toolNumber": "SCSTU"}, {"manufacturerCodeValue": "XXX", "specific": "1", "toolNumber": "SCST1AK35U"}, {"specific": "0", "toolNumber": "No Specific"}'
gwason = '[{"manufacturerCodeValue": "XXXX", "specific": "1", "toolNumber": "SCSTU"}, {"manufacturerCodeValue": "XXX", "specific": "1", "toolNumber": "SCST1AK35U"}, {"specific": "0", "toolNumber": "No Specific"}]'

# Create DataFrame
data = [(1, gwason)]
df = spark.createDataFrame(data, ["id", "gwason"])
#display(df)

# Create array of structs directly, as the input for 'explode' needs to be an array or map data type
df_parsed = df.withColumn("parsed", expr("from_json(gwason, 'array&amp;lt;struct&amp;lt;manufacturerCodeValue:string,specific:string,toolNumber:string&amp;gt;&amp;gt;')"))
#display(df_parsed)

# Explode the array of structs
df_exploded = df_parsed.select(explode("parsed").alias("exploded"))
#display(df_exploded)

# Filter and collect toolNumber values
result_df = df_exploded.filter(col("exploded.toolNumber") != "No Specific").select("exploded.toolNumber")
display(result_df)

# Convert DataFrame to list and print
result_list = result_df.rdd.map(lambda row: row[0]).collect()
print(str(result_list))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 20:32:55 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3676422#M332</guid>
      <dc:creator>dphugo</dc:creator>
      <dc:date>2024-02-02T20:32:55Z</dc:date>
    </item>
    <item>
      <title>Re: JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3677915#M333</link>
      <description>&lt;P&gt;Dear dphugo,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's a great help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Thank you so much. If you don't mind, I'm having a problem collecting it back to my dataframe. I got [Object Object] as a result&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df_parsed = result_df.withColumn("parsed", expr("from_json(SUPP_PART_REQ_PN, 'array&amp;lt;struct&amp;lt;manufacturerCodeValue:string,specific:string,toolNumber:string&amp;gt;&amp;gt;')"))
df_exploded = df_parsed.select("FileName", F.explode("parsed").alias("exploded_value"))
result_df = df_exploded.groupBy("FileName").agg(collect_list("exploded_value").alias("SUPP_PART_REQ_PN3"))&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;I'm using a display() to see the results.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2024-02-03 at 9.08.17 PM.png" style="width: 210px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1036800iE44378565157A8D6/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2024-02-03 at 9.08.17 PM.png" alt="Screen Shot 2024-02-03 at 9.08.17 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Feb 2024 17:10:34 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3677915#M333</guid>
      <dc:creator>kirah2128</dc:creator>
      <dc:date>2024-02-03T17:10:34Z</dc:date>
    </item>
    <item>
      <title>Re: JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3679477#M334</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/643638"&gt;@kirah2128&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Thanks for using Fabric Community.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;I understand that you're encountering an issue when trying to collect the data back into your DataFrame and seeing [Object Object] in your display. Here are a few things to check:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;1. &lt;STRONG&gt;Data size:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The&amp;nbsp;collect()&amp;nbsp;method in Spark is used to retrieve all elements of a DataFrame to the driver node. Try using &lt;STRONG&gt;df.collect()&lt;/STRONG&gt;.&amp;nbsp;This can be problematic for large datasets as it might not fit in memory,&amp;nbsp;leading to errors.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;2. &lt;STRONG&gt;Data format:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You mentioned using&amp;nbsp;display()&amp;nbsp;to see the results.&amp;nbsp;This function might not display complex data structures like DataFrames properly,&amp;nbsp;leading to the&amp;nbsp;[Object Object]&amp;nbsp;output.&amp;nbsp;Try using&amp;nbsp;&lt;STRONG&gt;print(result_df.toPandas())&lt;/STRONG&gt;&amp;nbsp;or&amp;nbsp;&lt;STRONG&gt;result_df.show()&lt;/STRONG&gt;&amp;nbsp;to view the DataFrame contents in a more structured format.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;3. &lt;STRONG&gt;Specific error message:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;If you're encountering an error beyond just&amp;nbsp;[Object Object],&amp;nbsp;please provide the exact error message.&amp;nbsp;This will help pinpoint the specific issue and suggest more tailored solutions.&lt;BR /&gt;&lt;BR /&gt;Hope this is helpful. Please let me know incase of further queries.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Feb 2024 07:16:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3679477#M334</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-05T07:16:01Z</dc:date>
    </item>
    <item>
      <title>Re: JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3682544#M335</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/643638"&gt;@kirah2128&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .&lt;BR /&gt;In case if you have any resolution please do share that same with the community as it can be helpful to others . &lt;BR /&gt;Otherwise, will respond back with the more details and we will try to help .&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 09:24:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3682544#M335</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-06T09:24:22Z</dc:date>
    </item>
    <item>
      <title>Re: JSON parsing</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3685279#M336</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/643638"&gt;@kirah2128&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .&lt;BR /&gt;In case if you have any resolution please do share that same with the community as it can be helpful to others .&lt;BR /&gt;If you have any question relating to the current thread, please do let us know and we will try out best to help you.&lt;BR /&gt;In case if you have any other question on a different issue, we request you to open a new thread .&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 09:20:12 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/JSON-parsing/m-p/3685279#M336</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-02-07T09:20:12Z</dc:date>
    </item>
  </channel>
</rss>

