The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event! Join us in Stockholm, Sweden from September 24-27, 2024.
2-for-1 sale on June 20 only!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.
I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have a second data frame that selects elements from the first using something like this:
df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)
But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.
I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id.
I tried data frame with Column Rename but it renames all column named Id.
If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...
Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?
Thanks in advance
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt @can you try aliasing the column when using .select
df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) df_renamed.show()
Hello @AndyDDC and thank you for the reply.
I tried your suggestion but it generated an error: ...name 'col; is not defined.
But from this website PySpark alias() Column & DataFrame Examples - Spark By {Examples} (sparkbyexamples.com)(which I think is about to become my new best friend 🙂 ) I added this line of code at the top of the block:
Proud to be a Super User! | |
Great to hear. And yes the Spark By Example website is awesome!
Hi @ToddChitt @can you try aliasing the column when using .select
df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) df_renamed.show()
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.
Ask questions in Eventhouse and KQL, Eventstream, and Reflex.
User | Count |
---|---|
5 | |
2 | |
1 | |
1 | |
1 |
User | Count |
---|---|
9 | |
5 | |
4 | |
3 | |
3 |