Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
For a data transformation task on Microsoft Fabric, I am using Pandas DataFrames (because of some missing features in the Spark version).
When trying to push the data to tables, I have to convert to Spark, which fails. The following code highlights the problem:
import numpy
import pandas as pd
df = pd.DataFrame(['id'] + [numpy.int64(i) for i in range(100)])
print(df.dtypes)
display(df)
The result is:
/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:428: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below: Expected bytes, got a 'numpy.int64' object Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
The code fails. If I remove the cast to int64, the error still appears, but the code is able to recover.
I found an older instance of the same bug here: https://learn.microsoft.com/en-us/answers/questions/852534/arrow-optimization-in-python-notebook-fai...
The accepted resolution in that thread does not resolve the problem for me. Any suggestions?
Thanks for using Microsoft Fabric Community.
I tried to repro the above scenario with the below code:
import numpy
import pandas as pd
data = [numpy.int64(i) for i in range(100)]
pandas_df = pd.DataFrame(data, columns=['id'])
print(df.dtypes)
spark_df = spark.createDataFrame(pandas_df)
# Print schema (data types)
print(spark_df.dtypes)
# Display DataFrame (depends on your notebook environment)
display(spark_df)
Output:
Please try the above code and let me know if the issue still persists.
Hope this helps.
Thank you.
Thank you for the support! I have built a workaround now and haven't had time to check back yet.
The above solution is interesting! My actual code looks a bit different though, because the data is being read from a CSV file. But your version highlights that not all dataframes are created alike. I'll look into it when it tackles me again!
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.
Thanks.
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread.
Thanks.
Hi @JoergNeulist,
Just to double check. Can't you use the asType conversion method for this?
https://pandas.pydata.org/docs/reference/frame.html#conversion
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html
The underlying task here is converting a Pandas DF to a Spark DF. I'm not trying to convert column data types. The cast is there purely to highlight the problem.
The symptom seems to be that Spark is trying to use pyarrow to optimize the conversion, but there's something wrong with the Java dependencies.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
10 | |
4 | |
4 | |
3 | |
3 |