Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all,
I want to test how to use Pandas API on Spark.
I'm reading a delta table and I want to save it to a single CSV file.
I already know how to do this in regular pandas, but I'd like to try it with the Pandas API on Spark.
I have a default lakehouse attached to my Notebook, and it works if I am using regular pandas.
import pyspark.pandas as ps
df = spark.read.load("abfss://Fabric@onelake.dfs.fabric.microsoft.com/TestLakehouse.Lakehouse/Tables/RandomNumbers")
df = df.limit(100)
df_pandasOnSpark = df.pandas_api()
df_pandasOnSpark.to_csv('/lakehouse/default/Files/RandomNumbers_PandasOnSpark.csv', header=True, index = False)
I'm getting the following error:
Py4JJavaError: An error occurred while calling o6290.save. : Operation failed: "Bad Request", 400, HEAD, http://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/lakehouse/default/Files/RandomNumbers_PandasOnSpark.csv?upn=false&action=getStatus&timeout=90
Anyone knows what I'm doing wrong & how I could get this to work?
Thanks in advance!
Solved! Go to Solution.
Hi @frithjof_v ,
Using the sample data as an example, try writing the results back to the ABFS path, like this:
import pyspark.pandas as ps
df = ps.read_parquet("abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Tables/publicholidays")
df = df.head(10)
df.to_csv('abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Files/sample_datasets/testname.csv', index=False)
Here it recognizes testname.csv as a folder, I did some searching and it seems that you can't specify the generated filename under spark at the moment.
azure synapse - How to write .csv File in ADLS Using Pyspark - Stack Overflow
pyspark.pandas.read_parquet — PySpark 3.5.2 documentation (apache.org)
pyspark.pandas.DataFrame.to_csv — PySpark 3.5.2 documentation (apache.org)
Best Regards,
Gao
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
How to get your questions answered quickly -- How to provide sample data in the Power BI Forum
Hi @frithjof_v ,
Using the sample data as an example, try writing the results back to the ABFS path, like this:
import pyspark.pandas as ps
df = ps.read_parquet("abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Tables/publicholidays")
df = df.head(10)
df.to_csv('abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Files/sample_datasets/testname.csv', index=False)
Here it recognizes testname.csv as a folder, I did some searching and it seems that you can't specify the generated filename under spark at the moment.
azure synapse - How to write .csv File in ADLS Using Pyspark - Stack Overflow
pyspark.pandas.read_parquet — PySpark 3.5.2 documentation (apache.org)
pyspark.pandas.DataFrame.to_csv — PySpark 3.5.2 documentation (apache.org)
Best Regards,
Gao
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
How to get your questions answered quickly -- How to provide sample data in the Power BI Forum
Thank you @Anonymous ,
So in order to specify the name of a single csv file, I guess we cannot use Pandas API on Spark?
Instead we would need to use regular Pandas https://learn.microsoft.com/en-us/fabric/data-science/read-write-pandas or Polars. I have tested regular Pandas and Polars, both are able to write single csv files with specific file name.
I was just curious if we could achieve the same with Pandas API on Spark. However it seems we need to use regular Pandas or Polars to achieve this. Thanks!
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
16 | |
15 | |
4 | |
4 | |
3 |