Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Harsha_k_111
New Member

GCS Export Issue with PySpark Notebook

I'm currently trying to run a script in MS Fabric Notebook Environement which is attached to a Lakehouse using a table shortcut.

spark.conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark.conf.set("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.enable", "true")
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.json.keyfile", "/lakehouse/default/Files/gcs_key.json")
spark.conf.set("spark.hadoop.google.cloud.auth.null.enable", "false")
spark.conf.set("google.cloud.auth.service.account.enable", "true")
spark.conf.set("google.cloud.auth.service.account.json.keyfile", "/lakehouse/default/Files/gcs_key.json")

query = "SELECT * FROM dbo_Geography_shortcut"
max_records_per_file = 50000
mode = "append"
format = "json"
gcs_path = "gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test_5/"

df = spark.sql(query)
df.write.option("maxRecordsPerFile", max_records_per_file).mode(mode).format(format).save(gcs_path)

This script is creaating staging files in the GCS path it is uploading and trying to delete them after actual files are uploaded. The service account don't have any delete permission assigned to it, So the spark job fails. I can't provide delete permission as this is restricted by my company to proceed with it.
Error:

Py4JJavaError: An error occurred while calling o5174.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=27, partition=0) failed; but task commit success, data duplication may happen. reason=ExceptionFailure(org.apache.spark.SparkException,[TASK_WRITE_FAILED] Task failed while writing rows to gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test.,[Ljava.lang.StackTraceElement;@7e3f1bb3,org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:776)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:499) ..........
.........
Caused by: java.io.IOException: Error deleting 'gs://qa_dmgr_audience-prod-replica-1_eu_6bf6/15070/raw_test/_temporary/0/_temporary/attempt_202507291107556685858892966191353_0027_m_000000_129/part-00000-bb5e042b-8f4d-4e6d-84a5-91149d3429ad-c000.json', stage 2 with generation 1753787276139795

I need to somehow avoid this creation of staging files in the first place and direct uploading to GCS path using the same script. 

Please help me on this !! 
Thanks in advance 🙂

4 REPLIES 4
v-ssriganesh
Community Support
Community Support

Hello @Harsha_k_111

Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.

 

Thank you for being part of the Microsoft Fabric Community.

 

v-ssriganesh
Community Support
Community Support

Hello @Harsha_k_111,

Hope everything’s going great with you. Just checking in has the issue been resolved or are you still running into problems? Sharing an update can really help others facing the same thing.

Thank you.

 

v-ssriganesh
Community Support
Community Support

Hello @Harsha_k_111,

We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.

Thank you.

v-ssriganesh
Community Support
Community Support

Hello @Harsha_k_111,
Thank you for reaching out to the Microsoft Fabric Community Forum.

we understand that the script is failing because it tries to delete temporary staging files in the GCS path, but your service account lacks delete permissions due to company restrictions.

To resolve this, you can configure the script to write directly to the GCS path without creating temporary files. This can be achieved by adjusting the Spark configurations to use a direct write mode, which avoids the need for delete permissions. Specifically, you can set the output stream type to bypass the default behavior of creating staging files. This should allow the job to complete successfully while adhering to your permission constraints.

 

Best regards,
Ganesh Singamshetty

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.