Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
In Qlik you have this function called ApplyMap:
MapTableName:
Mapping LOAD
ValueIn
ValueOut
Resident Dates
You can now use this is Table in a regular load script like this:
FactTable:
ApplyMap('MapTableName',DateF,Null()) as Week
How can i easlily do this in Spark without using a join
I was thinking about first loading a dataframe with the ValuesIn,ValuesOut
Then load the FactTable into a second dataframe.
How how do i add the new column with the "VLOOKUP"? i don't want to use a join
Solved! Go to Solution.
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications