Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
DebbieE
Community Champion
Community Champion

Pyspark Returning values from two data fram

I can write this really easily in SQL But I want to transform the code to pyspark and Im really truggling to find an answer

 

I have TableA ID, Name, Metric1,Metric2, Metric3

 

1, Jake Smith, 54, 22, 23

2, Peter White, 24, 23, 55

3, Jake Smith, 33, 55, 14

 

And TableB ID, Name

1, Jake Smith

2, Peter White

 

I want to update the ID in Table A. So essentially

 

Select b.ID, a.Name, a.Metric1,a.Metric2,a. Metric3

FROM TABLEA a

INNER JOIN TABLEB b ON a.Name = b.Name

 

1, Jake Smith, 54, 22, 23

2, Peter White, 24, 23, 55

1, Jake Smith, 33, 55, 14

 

If anyone knows of any conversion from SQL to pyspark. What we need is some kind of SQL to pyspark conversion generator to help all us SQL People get our bearings with Pyspark

1 ACCEPTED SOLUTION
v-gchenna-msft
Community Support
Community Support

Hi @DebbieE ,

Thanks for using Fabric Community.

# Read TableA and TableB data into DataFrames
tableA_df = spark.read.format("csv").option("header", True).load("path/to/tableA.csv")
tableB_df = spark.read.format("csv").option("header", True).load("path/to/tableB.csv")

# Join TableA and TableB on the Name column
joined_df = tableA_df.join(tableB_df, tableA_df.Name == tableB_df.Name, how="inner")

# Select desired columns
result_df = joined_df.select("b.ID", "a.Name", "a.Metric1", "a.Metric2", "a.Metric3")

# Display the result (optional)
result_df.show()


Docs to refer -
PySpark Join Types - Join Two DataFrames - GeeksforGeeks

Hope this is helpful. Please let me know incase of further queries.

 

View solution in original post

3 REPLIES 3
v-gchenna-msft
Community Support
Community Support

Hi @DebbieE ,

Thanks for using Fabric Community.

# Read TableA and TableB data into DataFrames
tableA_df = spark.read.format("csv").option("header", True).load("path/to/tableA.csv")
tableB_df = spark.read.format("csv").option("header", True).load("path/to/tableB.csv")

# Join TableA and TableB on the Name column
joined_df = tableA_df.join(tableB_df, tableA_df.Name == tableB_df.Name, how="inner")

# Select desired columns
result_df = joined_df.select("b.ID", "a.Name", "a.Metric1", "a.Metric2", "a.Metric3")

# Display the result (optional)
result_df.show()


Docs to refer -
PySpark Join Types - Join Two DataFrames - GeeksforGeeks

Hope this is helpful. Please let me know incase of further queries.

 

Hello @DebbieE ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help.

Hi @DebbieE ,

Glad to know that your query got resolved. Please continue using Fabric Community on your further queries.

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors