Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
SofieW
Regular Visitor

Converting from pandas to spark df in notebook

Hi

 

I have a problem converting a pandas dataframe to spark. I'm still learning and when I want to clean my data, I use the Data Wrangler. The Wrangler converts my df to pandas and when I add the code back to my notebook, it doesn't convert it back to spark. (although it says it wil do so) 

 

So I tried it myself, using a schema:

 

from pyspark.sql.types import StringType, FloatType,StructType, StructField, DateType,IntegerType
schema = StructType([StructField("RESPONDE", IntegerType(), True)
                     , StructField("SUBMITDA", DateType(), True)
                     , StructField("q11a1", DateType(), True)
                     , StructField("q11a2", DateType(), True)
                     , StructField("Weging_herkomstlogies", FloatType(), True)
                     , StructField("WegingTOTAAL", FloatType(), True)
                     , StructField("Question", StringType(), True)
                     , StructField("Answer", StringType(), True)
                     , StructField("Gewogen_totaal", FloatType(), True)])
df_Silver_clean = spark.createDataFrame(pandas_df_Silver_clean, schema=schema)
display(df_Silver_clean)
 
I always get this error: /opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:428: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below: Expected bytes, got a 'int' object Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true. warn(msg).
 
When I look at the preview in my notebook, I see numbers in the columns I defined as numbers. But when I Write it to a table, the numbers change into NaN.
I already looked for a solution in other forum questions, but people just give a code and I have no idea whatsoever what it is or what it is supposed to do. I'd like to know why I'm getting this error and what I'm supposed to do now? 
Thank you
1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @SofieW ,
Thank you for reaching out to us on Microsoft Fabric Community Forum!

The error happens because Spark uses Apache Arrow to speed up converting pandas DataFrames to Spark, but Arrow is strict about matching data types exactly. If pandas DataFrame’s column types (e.g., integers or dates) do not perfectly match the Spark schema you defined, you may get this error, and it might lead to NaNs when writing to a table. Please follow the below steps:

1. Check your DataFrame’s column types. Ensure they match your schema.

2. Try df_Silver_clean = spark.createDataFrame(pandas_df_Silver_clean) without a schema to see if it resolves the issue.
3.  Add spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") before the conversion to bypass Arrow’s strict checks.

These steps might help to fix the error. Feel free to let us know if you have any issues.

Hope this resolve your query.If so, give us kudos and consider accepting it as solution.

Regards,
Pallavi G.

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @SofieW ,
Thank you for reaching out to us on Microsoft Fabric Community Forum!

The error happens because Spark uses Apache Arrow to speed up converting pandas DataFrames to Spark, but Arrow is strict about matching data types exactly. If pandas DataFrame’s column types (e.g., integers or dates) do not perfectly match the Spark schema you defined, you may get this error, and it might lead to NaNs when writing to a table. Please follow the below steps:

1. Check your DataFrame’s column types. Ensure they match your schema.

2. Try df_Silver_clean = spark.createDataFrame(pandas_df_Silver_clean) without a schema to see if it resolves the issue.
3.  Add spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") before the conversion to bypass Arrow’s strict checks.

These steps might help to fix the error. Feel free to let us know if you have any issues.

Hope this resolve your query.If so, give us kudos and consider accepting it as solution.

Regards,
Pallavi G.

Hi

I tried your first and second suggestions before adding the schema, and they didn't solve the issue. Your third option did help me. Thank you.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.