<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pandas API on Spark in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4106341#M3665</link>
    <description>&lt;P&gt;Thank you @Anonymous&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So in order to specify the name of a single csv file, I guess we cannot use Pandas API on Spark?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Instead we would need to use regular Pandas &lt;A href="https://learn.microsoft.com/en-us/fabric/data-science/read-write-pandas" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/fabric/data-science/read-write-pandas&lt;/A&gt;&amp;nbsp;or Polars. I have tested regular Pandas and Polars, both are able to write single csv files with specific file name.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was just curious if we could achieve the same with Pandas API on Spark. However it seems we need to use regular Pandas or Polars to achieve this. Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 19 Aug 2024 07:32:43 GMT</pubDate>
    <dc:creator>frithjof_v</dc:creator>
    <dc:date>2024-08-19T07:32:43Z</dc:date>
    <item>
      <title>Pandas API on Spark</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4105605#M3653</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to test how to use Pandas API on Spark.&lt;/P&gt;&lt;P&gt;I'm reading a delta table and I want to save it to a single CSV file.&lt;/P&gt;&lt;P&gt;I already know how to do this in regular pandas, but I'd like to try it with the Pandas API on Spark.&lt;/P&gt;&lt;P&gt;I have a default lakehouse attached to my Notebook, and it works if I am using regular pandas.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import pyspark.pandas as ps

df = spark.read.load("abfss://Fabric@onelake.dfs.fabric.microsoft.com/TestLakehouse.Lakehouse/Tables/RandomNumbers")
df = df.limit(100)

df_pandasOnSpark = df.pandas_api()

df_pandasOnSpark.to_csv('/lakehouse/default/Files/RandomNumbers_PandasOnSpark.csv', header=True, index = False)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm getting the following error:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;SPAN class=""&gt;Py4JJavaError&lt;/SPAN&gt;: An error occurred while calling o6290.save. : Operation failed: "Bad Request", 400, HEAD, &lt;A href="http://onelake.dfs.fabric.microsoft.com/" target="_blank" rel="noopener"&gt;http://onelake.dfs.fabric.microsoft.com/&lt;/A&gt;&amp;lt;workspaceGUID&amp;gt;/lakehouse/default/Files/RandomNumbers_PandasOnSpark.csv?upn=false&amp;amp;action=getStatus&amp;amp;timeout=90&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyone knows what I'm doing wrong &amp;amp; how I could get this to work?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2024 09:14:37 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4105605#M3653</guid>
      <dc:creator>frithjof_v</dc:creator>
      <dc:date>2024-08-18T09:14:37Z</dc:date>
    </item>
    <item>
      <title>Re: Pandas API on Spark</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4106278#M3664</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/437984"&gt;@frithjof_v&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Using the sample data as an example, try writing the results back to the ABFS path, like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;import pyspark.pandas as ps


df = ps.read_parquet("abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Tables/publicholidays")

df = df.head(10)

df.to_csv('abfss://xxxxxx@onelake.dfs.fabric.microsoft.com/LH_002.Lakehouse/Files/sample_datasets/testname.csv', index=False)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vcgaomsft_1-1724048742275.png" style="width: 999px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1152988iB40C93A96555A197/image-size/large?v=v2&amp;amp;px=999" role="button" title="vcgaomsft_1-1724048742275.png" alt="vcgaomsft_1-1724048742275.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Here it recognizes testname.csv as a folder, I did some searching and it seems that you can't specify the generated filename under spark at the moment.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/74342180/how-to-write-csv-file-in-adls-using-pyspark" target="_blank"&gt;azure synapse - How to write .csv File in ADLS Using Pyspark - Stack Overflow&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vcgaomsft_0-1724048706757.png" style="width: 999px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1152985iC759A40BFA797730/image-size/large?v=v2&amp;amp;px=999" role="button" title="vcgaomsft_0-1724048706757.png" alt="vcgaomsft_0-1724048706757.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.read_parquet.html" target="_blank"&gt;pyspark.pandas.read_parquet — PySpark 3.5.2 documentation (apache.org)&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.to_csv.html" target="_blank"&gt;pyspark.pandas.DataFrame.to_csv — PySpark 3.5.2 documentation (apache.org)&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="2"&gt;Best Regards,&lt;BR /&gt;Gao&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Community Support Team&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="2"&gt;If there is any post&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT size="3" color="#008080"&gt;&lt;EM&gt;&lt;STRONG&gt;helps&lt;/STRONG&gt;&lt;/EM&gt;&lt;/FONT&gt;, then please consider&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT size="3"&gt;&lt;FONT color="#008080"&gt;&lt;EM&gt;&lt;STRONG&gt;Accept it as the solution&lt;/STRONG&gt;&lt;/EM&gt;&lt;/FONT&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;to help the other members find it more quickly.&lt;BR /&gt;If I misunderstand your needs or you still have problems on it, please feel free to let us know.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#008080"&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT size="3"&gt;Thanks a lot!&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;FONT size="2" color="#FF0000"&gt;&lt;A href="https://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/38490" target="_blank" rel="noopener nofollow noreferrer"&gt;How to get your questions answered quickly&lt;/A&gt;&amp;nbsp;--&amp;nbsp;&lt;A href="https://community.powerbi.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-Forum/ba-p/963216" target="_blank" rel="noopener nofollow noreferrer"&gt;&amp;nbsp;How to provide sample data in the Power BI Forum&lt;/A&gt;&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2024 06:31:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4106278#M3664</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-08-19T06:31:01Z</dc:date>
    </item>
    <item>
      <title>Re: Pandas API on Spark</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4106341#M3665</link>
      <description>&lt;P&gt;Thank you @Anonymous&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So in order to specify the name of a single csv file, I guess we cannot use Pandas API on Spark?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Instead we would need to use regular Pandas &lt;A href="https://learn.microsoft.com/en-us/fabric/data-science/read-write-pandas" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/fabric/data-science/read-write-pandas&lt;/A&gt;&amp;nbsp;or Polars. I have tested regular Pandas and Polars, both are able to write single csv files with specific file name.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was just curious if we could achieve the same with Pandas API on Spark. However it seems we need to use regular Pandas or Polars to achieve this. Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2024 07:32:43 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Pandas-API-on-Spark/m-p/4106341#M3665</guid>
      <dc:creator>frithjof_v</dc:creator>
      <dc:date>2024-08-19T07:32:43Z</dc:date>
    </item>
  </channel>
</rss>

