<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: GCS Export Issue with PySpark Notebook in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4792463#M11618</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1295153"&gt;@Harsha_k_111&lt;/a&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for being part of the Microsoft Fabric Community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 10 Aug 2025 20:33:34 GMT</pubDate>
    <dc:creator>v-ssriganesh</dc:creator>
    <dc:date>2025-08-10T20:33:34Z</dc:date>
    <item>
      <title>GCS Export Issue with PySpark Notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4784374#M11416</link>
      <description>&lt;P&gt;I'm currently trying to run a script in MS Fabric Notebook Environement which is attached to a Lakehouse using a table shortcut.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark.conf.set("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.enable", "true")
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.json.keyfile", "/lakehouse/default/Files/gcs_key.json")
spark.conf.set("spark.hadoop.google.cloud.auth.null.enable", "false")
spark.conf.set("google.cloud.auth.service.account.enable", "true")
spark.conf.set("google.cloud.auth.service.account.json.keyfile", "/lakehouse/default/Files/gcs_key.json")

query = "SELECT * FROM dbo_Geography_shortcut"
max_records_per_file = 50000
mode = "append"
format = "json"
gcs_path = "gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test_5/"

df = spark.sql(query)
df.write.option("maxRecordsPerFile", max_records_per_file).mode(mode).format(format).save(gcs_path)&lt;/LI-CODE&gt;&lt;P&gt;This script is creaating staging files in the GCS path it is uploading and trying to delete them after actual files are uploaded. The service account don't have any delete permission assigned to it, So the spark job fails. I can't provide delete permission as this is restricted by my company to proceed with it.&lt;BR /&gt;Error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Py4JJavaError: An error occurred while calling o5174.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=27, partition=0) failed; but task commit success, data duplication may happen. reason=ExceptionFailure(org.apache.spark.SparkException,[TASK_WRITE_FAILED] Task failed while writing rows to gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test.,[Ljava.lang.StackTraceElement;@7e3f1bb3,org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:776)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:499) ..........
.........
Caused by: java.io.IOException: Error deleting 'gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test/_temporary/0/_temporary/attempt_202507291107556685858892966191353_0027_m_000000_129/part-00000-bb5e042b-8f4d-4e6d-84a5-91149d3429ad-c000.json', stage 2 with generation 1753787276139795&lt;/LI-CODE&gt;&lt;P&gt;I need to somehow avoid this creation of staging files in the first place and direct uploading to GCS path using the same script.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Please help me on this !!&amp;nbsp;&lt;BR /&gt;Thanks in advance &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Aug 2025 11:51:45 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4784374#M11416</guid>
      <dc:creator>Harsha_k_111</dc:creator>
      <dc:date>2025-08-01T11:51:45Z</dc:date>
    </item>
    <item>
      <title>Re: GCS Export Issue with PySpark Notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4785203#M11428</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1295153"&gt;@Harsha_k_111&lt;/a&gt;, &lt;BR /&gt;Thank you for reaching out to the Microsoft Fabric Community Forum.&lt;BR /&gt;&lt;BR /&gt;we understand that the script is failing because it tries to delete temporary staging files in the GCS path, but your service account lacks delete permissions due to company restrictions.&lt;/P&gt;
&lt;P&gt;To resolve this, you can configure the script to write directly to the GCS path without creating temporary files. This can be achieved by adjusting the Spark configurations to use a direct write mode, which avoids the need for delete permissions. Specifically, you can set the output stream type to bypass the default behavior of creating staging files. This should allow the job to complete successfully while adhering to your permission constraints.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Ganesh Singamshetty&lt;/P&gt;</description>
      <pubDate>Sat, 02 Aug 2025 10:23:40 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4785203#M11428</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-08-02T10:23:40Z</dc:date>
    </item>
    <item>
      <title>Re: GCS Export Issue with PySpark Notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4786791#M11476</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1295153"&gt;@Harsha_k_111&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Aug 2025 13:35:18 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4786791#M11476</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-08-04T13:35:18Z</dc:date>
    </item>
    <item>
      <title>Re: GCS Export Issue with PySpark Notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4790546#M11578</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1295153"&gt;@Harsha_k_111&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hope everything’s going great with you. Just checking in has the issue been resolved or are you still running into problems? Sharing an update can really help others facing the same thing.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Aug 2025 17:13:53 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4790546#M11578</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-08-07T17:13:53Z</dc:date>
    </item>
    <item>
      <title>Re: GCS Export Issue with PySpark Notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4792463#M11618</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1295153"&gt;@Harsha_k_111&lt;/a&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for being part of the Microsoft Fabric Community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Aug 2025 20:33:34 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/GCS-Export-Issue-with-PySpark-Notebook/m-p/4792463#M11618</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-08-10T20:33:34Z</dc:date>
    </item>
  </channel>
</rss>

