<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PySpark to rename a column in a dataframe in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3758735#M1213</link>
    <description>&lt;P&gt;Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ToddChitt_0-1710266541471.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1058746i2BA0A65FC53DD8E5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ToddChitt_0-1710266541471.png" alt="ToddChitt_0-1710266541471.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have&amp;nbsp;a second data frame that selects elements from the first using something like this:&lt;/P&gt;&lt;P&gt;df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)&lt;/P&gt;&lt;P&gt;But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried data frame with Column Rename but it renames all column named Id.&lt;/P&gt;&lt;P&gt;If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...&lt;/P&gt;&lt;P&gt;Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 12 Mar 2024 18:21:36 GMT</pubDate>
    <dc:creator>ToddChitt</dc:creator>
    <dc:date>2024-03-12T18:21:36Z</dc:date>
    <item>
      <title>PySpark to rename a column in a dataframe</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3758735#M1213</link>
      <description>&lt;P&gt;Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ToddChitt_0-1710266541471.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1058746i2BA0A65FC53DD8E5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ToddChitt_0-1710266541471.png" alt="ToddChitt_0-1710266541471.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have&amp;nbsp;a second data frame that selects elements from the first using something like this:&lt;/P&gt;&lt;P&gt;df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)&lt;/P&gt;&lt;P&gt;But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried data frame with Column Rename but it renames all column named Id.&lt;/P&gt;&lt;P&gt;If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...&lt;/P&gt;&lt;P&gt;Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Mar 2024 18:21:36 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3758735#M1213</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-03-12T18:21:36Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark to rename a column in a dataframe</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3758995#M1214</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/270"&gt;@ToddChitt&lt;/a&gt;&amp;nbsp;@can you try aliasing the column when using .select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;df_renamed = df.select(col(&lt;SPAN class=""&gt;"Name"&lt;/SPAN&gt;).&lt;SPAN class=""&gt;alias&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"EmployeeName"&lt;/SPAN&gt;), col(&lt;SPAN class=""&gt;"Department"&lt;/SPAN&gt;).&lt;SPAN class=""&gt;alias&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"Dept"&lt;/SPAN&gt;)) 
df_renamed.show() &lt;/PRE&gt;</description>
      <pubDate>Tue, 12 Mar 2024 22:13:18 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3758995#M1214</guid>
      <dc:creator>AndyDDC</dc:creator>
      <dc:date>2024-03-12T22:13:18Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark to rename a column in a dataframe</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3760586#M1215</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/265587"&gt;@AndyDDC&lt;/a&gt;&amp;nbsp;and thank you for the reply.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried your suggestion but it generated an error: &lt;FONT face="courier new,courier"&gt;...name 'col; is not defined.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;But from this website&amp;nbsp;&lt;A href="https://sparkbyexamples.com/pyspark/pyspark-alias-column-examples/" target="_blank"&gt;PySpark alias() Column &amp;amp; DataFrame Examples - Spark By {Examples} (sparkbyexamples.com)&lt;/A&gt;(which I think is about to become my new best friend &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ) I added this line of code at the top of the block:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql.functions &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; col&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;And that fixed it. &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Another example from the site shows this syntax will work without the import statement above:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;df.select ( df.Id.alias ( "Employee_Id" ),...&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thanks for your help.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 13 Mar 2024 12:06:02 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3760586#M1215</guid>
      <dc:creator>ToddChitt</dc:creator>
      <dc:date>2024-03-13T12:06:02Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark to rename a column in a dataframe</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3760995#M1216</link>
      <description>&lt;P&gt;Great to hear. And yes the Spark By Example website is awesome!&lt;/P&gt;</description>
      <pubDate>Wed, 13 Mar 2024 14:39:57 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/PySpark-to-rename-a-column-in-a-dataframe/m-p/3760995#M1216</guid>
      <dc:creator>AndyDDC</dc:creator>
      <dc:date>2024-03-13T14:39:57Z</dc:date>
    </item>
  </channel>
</rss>

