<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PySpark Notebook to process complex JSON in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889151#M1878</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Glad to know that your issue got resolved. Please continue using Fabric Community on your further queries.&lt;/P&gt;</description>
    <pubDate>Fri, 03 May 2024 13:02:47 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-05-03T13:02:47Z</dc:date>
    <item>
      <title>PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3880282#M1869</link>
      <description>&lt;P&gt;Hello. I am using a PySpark notebook in Fabric to process incoming JSON files. The Notebook reads the JSON file into a base dataframe, then from there parse it out into two other dataframes that get dumped into Lakehouse tables.&lt;/P&gt;
&lt;P&gt;The JSON is complex and sometimes some elements are missing.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, most times I get JSON data structured like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ToddChitt_0-1714497753233.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1089086i5DCFB12EEFF39782/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ToddChitt_0-1714497753233.png" alt="ToddChitt_0-1714497753233.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;But every once in a while I get something like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ToddChitt_1-1714497820187.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1089087i1784C09DB7AF688D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ToddChitt_1-1714497820187.png" alt="ToddChitt_1-1714497820187.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried the PySpark When/Otherwise syntax like this in my dataframe select statement:&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;when(col(&lt;SPAN&gt;"positionData.manager"&lt;SPAN&gt;).isNull(),&lt;SPAN&gt;None&lt;SPAN&gt;).otherwise(col(&lt;SPAN&gt;"positionData.manager.id"&lt;SPAN&gt;)).alias(&lt;SPAN&gt;"ManagerId"&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&amp;nbsp;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;The trouble is that both paths are evaluated, even if one is not needed. For example, if the data is like the lower screenshot, with&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;"manager": null&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;I still get an exception:&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;AnalysisException: [INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "manager". Need a complex type [STRUCT, ARRAY, MAP] but got "STRING".&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&amp;nbsp;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;I also tried defining a function to handle this, but it exhibits the same issue:&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;# Function to handle null values in various fields&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;def &lt;SPAN&gt;handle_nulls&lt;SPAN&gt;(&lt;SPAN&gt;entity_name&lt;SPAN&gt;, &lt;SPAN&gt;field_name)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN&gt;return&lt;SPAN&gt; when(col(&lt;SPAN&gt;f&lt;SPAN&gt;"&lt;SPAN&gt;{&lt;SPAN&gt;entity_name&lt;SPAN&gt;}&lt;SPAN&gt;"&lt;SPAN&gt;).isNull(), &lt;SPAN&gt;None&lt;SPAN&gt;).otherwise(col(&lt;SPAN&gt;f&lt;SPAN&gt;"&lt;SPAN&gt;{&lt;SPAN&gt;entity_name&lt;SPAN&gt;}&lt;SPAN&gt;.&lt;SPAN&gt;{&lt;SPAN&gt;field_name&lt;SPAN&gt;}&lt;SPAN&gt;"&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&amp;nbsp;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;I also tried wrapping the return statement in a try/except block, but that still fails with the same error.&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&amp;nbsp;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;# Function to handle null values in various fields&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;def &lt;SPAN&gt;handle_nulls&lt;SPAN&gt;(&lt;SPAN&gt;entity_name&lt;SPAN&gt;, &lt;SPAN&gt;field_name)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN&gt;try&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN&gt;return&lt;SPAN&gt; when(col(&lt;SPAN&gt;f&lt;SPAN&gt;"&lt;SPAN&gt;{&lt;SPAN&gt;entity_name&lt;SPAN&gt;}&lt;SPAN&gt;"&lt;SPAN&gt;).isNull(), &lt;SPAN&gt;None&lt;SPAN&gt;).otherwise(col(&lt;SPAN&gt;f&lt;SPAN&gt;"&lt;SPAN&gt;{&lt;SPAN&gt;entity_name&lt;SPAN&gt;}&lt;SPAN&gt;.&lt;SPAN&gt;{&lt;SPAN&gt;field_name&lt;SPAN&gt;}&lt;SPAN&gt;"&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN&gt;except&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN&gt;return &lt;SPAN&gt;None&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm fairly new at PySpark. Does anyone have any suggestions?&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 02 May 2024 16:07:14 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3880282#M1869</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-02T16:07:14Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3885141#M1870</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Thanks for using Fabric Community.&lt;/P&gt;
&lt;P&gt;PySpark provides a function called coalesce that allows you to specify a sequence of columns. The first non-null value in the sequence is returned. You can use this function to handle null values in your when expression like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;from pyspark.sql.functions import col, coalesce

df = df.withColumn("ManagerId", coalesce(col("positionData.manager.id"), lit(None)))&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this example, coalesce will first try to access the value of the column "positionData.manager.id". If it's null, it will return None instead.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Can you please check this -&amp;nbsp;&lt;A href="https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.functions.coalesce.html" target="_blank"&gt;pyspark.sql.functions.coalesce — PySpark 3.1.1 documentation (apache.org)&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Hope this is helpful. Please let me know incase of further queries.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 07:09:21 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3885141#M1870</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-05-02T07:09:21Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886499#M1871</link>
      <description>&lt;P&gt;While I have not tried it, I don't think it is going to work, as is:&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;&lt;SPAN&gt;In this example, coalesce will first try to access the value of the column "positionData.manager.id". If it's null, it will return None instead.&amp;lt;&amp;lt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This issue is NOT that there is a NULL value in column "positionData.manager.id" it is that the column does not exist, cannot be found. If it is there at all, it is part of some nested JSON structure.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If I have data like this picture from the original post:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ToddChitt_0-1714659900779.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1090507i40CD3BBC29973C11/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ToddChitt_0-1714659900779.png" alt="ToddChitt_0-1714659900779.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;and I try to reference&amp;nbsp;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;col(&lt;/SPAN&gt;&lt;SPAN&gt;"positionData.manager.id"&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;)&lt;/FONT&gt; then I get this error:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;AnalysisException: [INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "manager". Need a complex type [STRUCT, ARRAY, MAP] but got "STRING".&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It might be possible to use nested COALESCE statements and / or WHEN/OTHERWISE functions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm going to have to experiment. Thanks for the tip.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 14:50:52 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886499#M1871</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-02T14:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886680#M1872</link>
      <description>&lt;P&gt;@Anonymous&lt;/a&gt;&amp;nbsp;Unfortunately, the COALESCE function does the same thing as the WHEN / EXCEPT function: It evaluates all paths offered, even though it is only going to take ONE of those paths. In my case, one of the paths will result in an error as shown above, and even though the logic of the function is such that it will not return a certain element, it still needs to evaluate it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;a bit of 'airware' example:&lt;/P&gt;
&lt;P&gt;COALESCE ( NULL, "some string not null", 1/0)&lt;/P&gt;
&lt;P&gt;This will error out on the 1 devided by zero path even though the logic is to return "some string not null".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any other suggestions?&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 16:13:06 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886680#M1872</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-02T16:13:06Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886784#M1873</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Can you please share the output json, so I can try it at my end and may suggest you?&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 17:13:50 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886784#M1873</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-05-02T17:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886880#M1874</link>
      <description>&lt;P&gt;I had to strip out some of the other stuff that is not relevent to the topic.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Case 1: the positionData.manager object has an id and employeeNumber field that I need to grab:&lt;/P&gt;
&lt;P&gt;{&lt;BR /&gt;"id": "00000001-0000-0000-0000-000000000000",&lt;BR /&gt;"positionData": {&lt;BR /&gt;"manager": {&lt;BR /&gt;"id": "00000002-0000-0000-0000-000000000000",&lt;BR /&gt;"employeeNumber": "1234"&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Case 2: the manager is simply null:&lt;/P&gt;
&lt;P&gt;{&lt;BR /&gt;"id": "00000001-0000-0000-0000-000000000000",&lt;BR /&gt;"positionData": {&lt;BR /&gt;"manager": null&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Case 2, we cannot navigate down to col("positionData.manager.id") becuase it doesn't exist. Hence the error. This seems to happen regarless of the function used ( when/otherwise or coalesce )&lt;/P&gt;
&lt;P&gt;I have no control over the incoming JSON structure.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any suggestion would be appreciated.&lt;BR /&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 18:34:13 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886880#M1874</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-02T18:34:13Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886914#M1875</link>
      <description>&lt;P&gt;I have also tried defining a function like this:&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;# Function to handle null values in various fields&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;handle_nulls&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;entity_name&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;field_name&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;try&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;#return when(col(f"{entity_name}").isNull(), None).otherwise(col(f"{entity_name}.{field_name}")) &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; &amp;nbsp;coalesce(when(col(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;entity_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;).isNull(), lit(&lt;/SPAN&gt;&lt;SPAN&gt;"NULL Value string"&lt;/SPAN&gt;&lt;SPAN&gt;)).otherwise(&lt;/SPAN&gt;&lt;SPAN&gt;None&lt;/SPAN&gt;&lt;SPAN&gt;), col(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;entity_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;field_name&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;except&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;None&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;And then call that function for a column like this:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;handle_nulls(&lt;/SPAN&gt;&lt;SPAN&gt;"positionData.manager"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;).alias(&lt;/SPAN&gt;&lt;SPAN&gt;"ManagerId"&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;But that generates the same error, which I don't really understand because of the try/except block. Maybe I'm not structuring that portion properly. I would think that if there is an error inside the TRY, then the EXCEPT takes over.&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;What am I missing on this one?&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 02 May 2024 18:43:35 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3886914#M1875</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-02T18:43:35Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889085#M1876</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;I tried to do some repro around your case, it is working perfectly fine.&lt;BR /&gt;Can you please find the code below,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vgchennamsft_0-1714739455024.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1091161iE41270F054F591A4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vgchennamsft_0-1714739455024.png" alt="vgchennamsft_0-1714739455024.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vgchennamsft_1-1714739467722.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1091162iC94180DA8877F891/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vgchennamsft_1-1714739467722.png" alt="vgchennamsft_1-1714739467722.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vgchennamsft_2-1714739481531.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1091163i92D689AF102D19C1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vgchennamsft_2-1714739481531.png" alt="vgchennamsft_2-1714739481531.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sample Json:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[
  {
    "id": "00000001-0000-0000-0000-000000000000",
    "positionData": {
      "manager": {
        "id": "00000002-0000-0000-0000-000000000000",
        "employeeNumber": "1234"
      }
    }
  },
  {
    "id": "00000001-0000-0000-0000-000000000000",
    "positionData": {
      "manager": null
    }
  }
]&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;Sample Code:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from pyspark.sql.types import StructType, StructField, StringType, ArrayType
from pyspark.sql.functions import col, coalesce, lit

# Define the schema for the nested objects
schema = StructType([
    StructField("id", StringType(), True),
    StructField("positionData", StructType([
        StructField("manager", StructType([
            StructField("id", StringType(), True),
            StructField("employeeNumber", StringType(), True)
        ]), True)
    ]), True)
])

# Read JSON data with multiline option and schema
df = spark.read.option("multiline", "true").json("Files/testing.json", schema=schema)

df = df.withColumn("ManagerId", coalesce(col("positionData.manager.id"), lit(None)))

display(df)

&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;BR /&gt;Please try this and let me know if you have further queries.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2024 12:33:03 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889085#M1876</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-05-03T12:33:03Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889140#M1877</link>
      <description>&lt;P&gt;This is wonderful, thank you.&lt;BR /&gt;I guess now I need to learn about defining my JSON schema ahead of time, and building my dataframes using the "withColumn" syntax instead of what I was doing, which was a straight df.select(...).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;More stuff to learn, but that's OK.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks again.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2024 12:58:38 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889140#M1877</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-05-03T12:58:38Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Notebook to process complex JSON</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889151#M1878</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Glad to know that your issue got resolved. Please continue using Fabric Community on your further queries.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2024 13:02:47 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-Notebook-to-process-complex-JSON/m-p/3889151#M1878</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-05-03T13:02:47Z</dc:date>
    </item>
  </channel>
</rss>

