Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I can write this really easily in SQL But I want to transform the code to pyspark and Im really truggling to find an answer
I have TableA ID, Name, Metric1,Metric2, Metric3
1, Jake Smith, 54, 22, 23
2, Peter White, 24, 23, 55
3, Jake Smith, 33, 55, 14
And TableB ID, Name
1, Jake Smith
2, Peter White
I want to update the ID in Table A. So essentially
Select b.ID, a.Name, a.Metric1,a.Metric2,a. Metric3
FROM TABLEA a
INNER JOIN TABLEB b ON a.Name = b.Name
1, Jake Smith, 54, 22, 23
2, Peter White, 24, 23, 55
1, Jake Smith, 33, 55, 14
If anyone knows of any conversion from SQL to pyspark. What we need is some kind of SQL to pyspark conversion generator to help all us SQL People get our bearings with Pyspark
Solved! Go to Solution.
Hi @DebbieE ,
Thanks for using Fabric Community.
# Read TableA and TableB data into DataFrames
tableA_df = spark.read.format("csv").option("header", True).load("path/to/tableA.csv")
tableB_df = spark.read.format("csv").option("header", True).load("path/to/tableB.csv")
# Join TableA and TableB on the Name column
joined_df = tableA_df.join(tableB_df, tableA_df.Name == tableB_df.Name, how="inner")
# Select desired columns
result_df = joined_df.select("b.ID", "a.Name", "a.Metric1", "a.Metric2", "a.Metric3")
# Display the result (optional)
result_df.show()
Docs to refer -
PySpark Join Types - Join Two DataFrames - GeeksforGeeks
Hope this is helpful. Please let me know incase of further queries.
Hi @DebbieE ,
Thanks for using Fabric Community.
# Read TableA and TableB data into DataFrames
tableA_df = spark.read.format("csv").option("header", True).load("path/to/tableA.csv")
tableB_df = spark.read.format("csv").option("header", True).load("path/to/tableB.csv")
# Join TableA and TableB on the Name column
joined_df = tableA_df.join(tableB_df, tableA_df.Name == tableB_df.Name, how="inner")
# Select desired columns
result_df = joined_df.select("b.ID", "a.Name", "a.Metric1", "a.Metric2", "a.Metric3")
# Display the result (optional)
result_df.show()
Docs to refer -
PySpark Join Types - Join Two DataFrames - GeeksforGeeks
Hope this is helpful. Please let me know incase of further queries.
Hello @DebbieE ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help.
Hi @DebbieE ,
Glad to know that your query got resolved. Please continue using Fabric Community on your further queries.
Check out the April 2024 Fabric update to learn about new features.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
User | Count |
---|---|
10 | |
10 | |
8 | |
5 | |
4 |