Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
silly_bird
Regular Visitor

Help extracting value from dict in a column

 

Hi all

 

I'm working on API integration in PySpark notebook and there is a column with email & phone that is an array with random order

 

 

 

contactMethods = [{'name': 'Email', 'value': 'email.com'}, {'name': 'Mobile', 'value': '1234'}]

df = spark.createDataFrame(
[(1, contactMethods)],
("key", "contactMethods")
)
display(df)

 

 

 

 

I need to translate it into "email" and "mobile" columns but can't figure out how to do this.

I have tried different samples of similar cases but they error in the notebook.

Even pyspark documentation samples don't work, for example "filter" sample: pyspark.sql.functions.filter — PySpark 3.5.3 documentation

I'm stuck ;(

 

Please help!

1 ACCEPTED SOLUTION
silly_bird
Regular Visitor

It looks like I managed to figure one solution.

Don't know good or bad, it is my only one 

 

from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

extract_email = udf(lambda cell: str(next(filter(lambda t: t["name"] == "Email", cell), {}).get("value", "")), StringType())
extract_mobile = udf(lambda cell: str(next(filter(lambda t: t["name"] == "Mobile", cell), {}).get("value", "")), StringType())
df = df.withColumn('email', extract_email(col("contactMethods"))).withColumn('mobile', extract_mobile(col("contactMethods")))
display(df)

 

Profies, please advise!

View solution in original post

4 REPLIES 4
Anonymous
Not applicable

Hi @silly_bird ,

 

It looks like you have found a solution. Could you please mark this helpful post as “Answered”?

 

This will help others in the community to easily find a solution if they are experiencing the same problem as you.

 

Thank you for your cooperation!

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

silly_bird
Regular Visitor

Update

 

If we read data from json, like that

 

df = spark.read.json(spark.sparkContext.parallelize([response.json()])).head(1)

 

.. then cell is a an array of Row objects, not an array of dict

 

I managed to workaround using asDict method on the row

 

 

extract_email = udf(lambda cell: None if cell is None else next(filter(lambda t: t["name"] == "Email", cell), Row(value=None)).asDict().get("value", None), StringType())

 

 

 

 

silly_bird
Regular Visitor

It looks like I managed to figure one solution.

Don't know good or bad, it is my only one 

 

from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

extract_email = udf(lambda cell: str(next(filter(lambda t: t["name"] == "Email", cell), {}).get("value", "")), StringType())
extract_mobile = udf(lambda cell: str(next(filter(lambda t: t["name"] == "Mobile", cell), {}).get("value", "")), StringType())
df = df.withColumn('email', extract_email(col("contactMethods"))).withColumn('mobile', extract_mobile(col("contactMethods")))
display(df)

 

Profies, please advise!

silly_bird
Regular Visitor

Also to add, contact methods llist can contain from 0 to many methods, I'm interested only in "email" and "phone"

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.