The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
In Qlik you have this function called ApplyMap:
MapTableName:
Mapping LOAD
ValueIn
ValueOut
Resident Dates
You can now use this is Table in a regular load script like this:
FactTable:
ApplyMap('MapTableName',DateF,Null()) as Week
How can i easlily do this in Spark without using a join
I was thinking about first loading a dataframe with the ValuesIn,ValuesOut
Then load the FactTable into a second dataframe.
How how do i add the new column with the "VLOOKUP"? i don't want to use a join
Solved! Go to Solution.
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications
Assuming you're refering to doing this in a pySpark Notebook (as opposed to a data pipeline).
A not-necessarily-best-practice* solution if you didn't want to use a join with a broadcasted lookup table would be to use a UDF (user defined function).
Step 1 - create a function that does the mapping, and then wrap it in a user defined function.
Step 2 - use the UDF in a .withColumn statement e.g. df2 = df.withColumn('column name', UDF('column name'))
* UDFs can have performance implications
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |
User | Count |
---|---|
5 | |
4 | |
3 | |
2 | |
2 |