March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
In Qlik you have this function called ApplyMap:
MapTableName:
Mapping LOAD
ValueIn
ValueOut
Resident Dates
You can now use this is Table in a regular load script like this:
FactTable:
ApplyMap('MapTableName',DateF,Null()) as Week
How can i easlily do this in Spark without using a join
I was thinking about first loading a dataframe with the ValuesIn,ValuesOut
Then load the FactTable into a second dataframe.
How how do i add the new column with the "VLOOKUP"? i don't want to use a join
Solved! Go to Solution.
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
User | Count |
---|---|
6 | |
5 | |
3 | |
1 | |
1 |
User | Count |
---|---|
15 | |
13 | |
11 | |
7 | |
6 |