Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
ToddChitt
Super User
Super User

PySpark to rename a column in a dataframe

Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.

 

ToddChitt_0-1710266541471.png

I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have a second data frame that selects elements from the first using something like this:

df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)

But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.

 

I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id. 

 

I tried data frame with Column Rename but it renames all column named Id.

If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...

Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?

 

Thanks in advance

 

 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





1 ACCEPTED SOLUTION
AndyDDC
Super User
Super User

Hi @ToddChitt @can you try aliasing the column when using .select

 

df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) 
df_renamed.show() 

View solution in original post

3 REPLIES 3
ToddChitt
Super User
Super User

Hello @AndyDDC and thank you for the reply. 

I tried your suggestion but it generated an error: ...name 'col; is not defined.

But from this website PySpark alias() Column & DataFrame Examples - Spark By {Examples} (sparkbyexamples.com)(which I think is about to become my new best friend 🙂 ) I added this line of code at the top of the block:

from pyspark.sql.functions import col
And that fixed it.
Another example from the site shows this syntax will work without the import statement above:
df.select ( df.Id.alias ( "Employee_Id" ),...
 
Thanks for your help.
 



Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Great to hear. And yes the Spark By Example website is awesome!

AndyDDC
Super User
Super User

Hi @ToddChitt @can you try aliasing the column when using .select

 

df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) 
df_renamed.show() 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.