Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi
I have a problem converting a pandas dataframe to spark. I'm still learning and when I want to clean my data, I use the Data Wrangler. The Wrangler converts my df to pandas and when I add the code back to my notebook, it doesn't convert it back to spark. (although it says it wil do so)
So I tried it myself, using a schema:
Solved! Go to Solution.
Hi @SofieW ,
Thank you for reaching out to us on Microsoft Fabric Community Forum!
The error happens because Spark uses Apache Arrow to speed up converting pandas DataFrames to Spark, but Arrow is strict about matching data types exactly. If pandas DataFrame’s column types (e.g., integers or dates) do not perfectly match the Spark schema you defined, you may get this error, and it might lead to NaNs when writing to a table. Please follow the below steps:
1. Check your DataFrame’s column types. Ensure they match your schema.
2. Try df_Silver_clean = spark.createDataFrame(pandas_df_Silver_clean) without a schema to see if it resolves the issue.
3. Add spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") before the conversion to bypass Arrow’s strict checks.
These steps might help to fix the error. Feel free to let us know if you have any issues.
Hope this resolve your query.If so, give us kudos and consider accepting it as solution.
Regards,
Pallavi G.
Hi @SofieW ,
Thank you for reaching out to us on Microsoft Fabric Community Forum!
The error happens because Spark uses Apache Arrow to speed up converting pandas DataFrames to Spark, but Arrow is strict about matching data types exactly. If pandas DataFrame’s column types (e.g., integers or dates) do not perfectly match the Spark schema you defined, you may get this error, and it might lead to NaNs when writing to a table. Please follow the below steps:
1. Check your DataFrame’s column types. Ensure they match your schema.
2. Try df_Silver_clean = spark.createDataFrame(pandas_df_Silver_clean) without a schema to see if it resolves the issue.
3. Add spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") before the conversion to bypass Arrow’s strict checks.
These steps might help to fix the error. Feel free to let us know if you have any issues.
Hope this resolve your query.If so, give us kudos and consider accepting it as solution.
Regards,
Pallavi G.
Hi
I tried your first and second suggestions before adding the schema, and they didn't solve the issue. Your third option did help me. Thank you.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
4 | |
4 | |
2 | |
2 | |
2 |