Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
JoergNeulist
Regular Visitor

Conversion of int column from Pandas to Spark fails

For a data transformation task on Microsoft Fabric, I am using Pandas DataFrames (because of some missing features in the Spark version).

When trying to push the data to tables, I have to convert to Spark, which fails. The following code highlights the problem:

Python
import numpy
import pandas as pd

df = pd.DataFrame(['id'] + [numpy.int64(i) for i in range(100)])
print(df.dtypes)
display(df)

The result is:

/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:428: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below: Expected bytes, got a 'numpy.int64' object Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.

The code fails. If I remove the cast to int64, the error still appears, but the code is able to recover.

I found an older instance of the same bug here: https://learn.microsoft.com/en-us/answers/questions/852534/arrow-optimization-in-python-notebook-fai...

The accepted resolution in that thread does not resolve the problem for me. Any suggestions?

6 REPLIES 6
v-cboorla-msft
Microsoft Employee
Microsoft Employee

Hi @JoergNeulist 

 

Thanks for using Microsoft Fabric Community.

I tried to repro the above scenario with the below code:

 

import numpy
import pandas as pd

data = [numpy.int64(i) for i in range(100)]
pandas_df = pd.DataFrame(data, columns=['id'])
print(df.dtypes)
spark_df = spark.createDataFrame(pandas_df)

# Print schema (data types)
print(spark_df.dtypes)

# Display DataFrame (depends on your notebook environment)
display(spark_df)


Output:

vcboorlamsft_0-1713522558173.png

 

Please try the above code and let me know if the issue still persists.

 

Hope this helps.

 

Thank you.

Thank you for the support! I have built a workaround now and haven't had time to check back yet.

The above solution is interesting! My actual code looks a bit different though, because the data is being read from a CSV file. But your version highlights that not all dataframes are created alike. I'll look into it when it tackles me again!

Hi @JoergNeulist 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.

 

Thanks.

Hi @JoergNeulist 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread.

 

Thanks.

Expiscornovus
Super User
Super User

Hi @JoergNeulist,

 

Just to double check. Can't you use the asType conversion method for this?

https://pandas.pydata.org/docs/reference/frame.html#conversion

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html

 

 



Happy to help out 🙂

I share #PowerAutomate and #SharePointOnline content on my Blog, Bluesky profile or Youtube Channel

The underlying task here is converting a Pandas DF to a Spark DF. I'm not trying to convert column data types. The cast is there purely to highlight the problem.

The symptom seems to be that Spark is trying to use pyarrow to optimize the conversion, but there's something wrong with the Java dependencies.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.