Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello. I am quite new to Spark Notebooks. I am using one to extract JSON data to save to tables in a Lakehouse. It works, but there are some slight issues. The data, being JSON, has nexted objects. I have included a screenshot here to highlight my issues.
I am starting with a data frame to read the entire JSON file. But the contents of the nested fields contain nested objects. So I have a second data frame that selects elements from the first using something like this:
df2 = df1.select( "Id", "EmployeeNumber",..."PositionData.Manager.Id"..."WorkLocation.Id"...)
But the second and third columns that are "Id" come out named "Id". I now have THREE columns named "Id" in the data frame. I want to rename the second and third ones to "Manager_Id" and "WorkLocation_Id", respectively.
I want to flatten the entire JSON file (there are no nested arrays, just nexted objects) such that I have the original Id (for the Employee) and Manager Id and Work Location Id.
I tried data frame with Column Rename but it renames all column named Id.
If this was SQL I could write it as: select..."PositionData.Manager.Id" AS [Manager_Id]...
Is there a way to rename a column inline in a dataframe select operation? Or is there another/better option?
Thanks in advance
Proud to be a Super User! | |
Solved! Go to Solution.
Hi @ToddChitt @can you try aliasing the column when using .select
df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) df_renamed.show()
Hello @AndyDDC and thank you for the reply.
I tried your suggestion but it generated an error: ...name 'col; is not defined.
But from this website PySpark alias() Column & DataFrame Examples - Spark By {Examples} (sparkbyexamples.com)(which I think is about to become my new best friend 🙂 ) I added this line of code at the top of the block:
Proud to be a Super User! | |
Great to hear. And yes the Spark By Example website is awesome!
Hi @ToddChitt @can you try aliasing the column when using .select
df_renamed = df.select(col("Name").alias("EmployeeName"), col("Department").alias("Dept")) df_renamed.show()
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
4 | |
4 | |
2 | |
2 | |
2 |