Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
ToddChitt
Super User
Super User

PySpark to rename a column in a dataframe

Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.

 

ToddChitt_0-1710266541471.png

I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have a second data frame that selects elements from the first using something like this:

df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)

But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.

 

I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id. 

 

I tried data frame with Column Rename but it renames all column named Id.

If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...

Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?

 

Thanks in advance

 

 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





1 ACCEPTED SOLUTION
AndyDDC
Super User
Super User

Hi @ToddChitt @can you try aliasing the column when using .select

 

df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) 
df_renamed.show() 

View solution in original post

3 REPLIES 3
ToddChitt
Super User
Super User

Hello @AndyDDC and thank you for the reply. 

I tried your suggestion but it generated an error: ...name 'col; is not defined.

But from this website PySpark alias() Column & DataFrame Examples - Spark By {Examples} (sparkbyexamples.com)(which I think is about to become my new best friend 🙂 ) I added this line of code at the top of the block:

from pyspark.sql.functions import col
And that fixed it.
Another example from the site shows this syntax will work without the import statement above:
df.select ( df.Id.alias ( "Employee_Id" ),...
 
Thanks for your help.
 



Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Great to hear. And yes the Spark By Example website is awesome!

AndyDDC
Super User
Super User

Hi @ToddChitt @can you try aliasing the column when using .select

 

df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) 
df_renamed.show() 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.