Solved: Re: Notebook fails to write data to a Delta table

Naveen-004 · ‎06-21-2024

Code: df.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(delta_table_path)

Error: Py4JJavaError: An error occurred while calling o29386.save. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322) at org.apache.spark.sql.execution.OptimizeWriteExchangeExec.doExecute(OptimizeWriteExchangeExec.scala:74) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:282) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:279) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:227) at org.apache.spark.sql.delta.constraints.DeltaInvariantCheckerExec.doExecute(DeltaInvariantCheckerExec.scala:72) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:282) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:279) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:227) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$1(FileFormatWriter.scala:255) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:297) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:238) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:219) at org.apache.spark.sql.delta.files.TransactionalWrite.$anonfun$writeFiles$1(TransactionalWrite.scala:425) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:214) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:389) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:349) at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:139) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:223) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:220) at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:139) at org.apache.spark.sql.delta.commands.WriteIntoDelta.write(WriteIntoDelta.scala:335) at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:98) at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1$adapted(WriteIntoDelta.scala:93) at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:237) at org.apache.spark.sql.delta.commands.WriteIntoDelta.run(WriteIntoDelta.scala:93) at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:180) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:152) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:214) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:152) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:145) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:145) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:129) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:123) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:200) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:897) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:412) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:322) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 106 in stage 2171.0 failed 4 times, most recent failure: Lost task 106.3 in stage 2171.0 (TID 54829) (vm-0c596022 executor 6): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container from a bad node: container_1718905903170_0001_01_000007 on host: vm-0c596022. Exit status: 50. Diagnostics: [2024-06-20 18:02:59.775]Exception from container-launch. Container id: container_1718905903170_0001_01_000007 Exit code: 50

richbenmintz · ‎06-27-2024

Hi @Naveen-004,

Are you able to share all the Code in your Notebook, Spark is a lazily evaluated there could be issues in earlier commands that are only getting executed when you try to write the dataframe.

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

View solution in original post

DeltaExecutor · ‎02-20-2025

It looks like your Spark job is failing due to an issue with the Delta Executor, specifically an ExecutorLostFailure error. This usually happens due to insufficient resources, node failures, or permission issues.

Here are a few things you can check to resolve the issue:

Check Executor Logs – The error message suggests that the Delta Executor exited unexpectedly. Reviewing the logs can provide more details about the cause.
Resource Allocation – The issue might be due to insufficient memory or CPU resources. Increasing the available resources for the executors could help.
Schema Mismatch – Since you're using schema merging, make sure there are no conflicts between the existing Delta table schema and the new data being written.
Cluster Stability – The error indicates a problem with a specific node. Restarting the cluster or checking for any failing nodes may resolve the issue.
Software Compatibility – Ensure you are using the latest compatible versions of Delta Lake and Apache Spark, as outdated versions can sometimes cause unexpected failures.

If the problem persists, try writing a small test dataset to see if the issue is with the data or the system itself. Let me know if you need more help!

JohnWick11244 · ‎11-09-2024

Thanks for sharing this support link

https://support.fabric.microsoft.com/support

JohnWick11244 · ‎11-09-2024

If your notebook fails to write data to a Delta table, the issue may stem from incorrect configuration, permissions, or connectivity problems. First, ensure that the Delta table exists and the notebook has access to the appropriate database and storage location. Check if you have the necessary write permissions. If you're using Delta Executor, ensure it’s properly configured with the correct version and dependencies. Also, verify that the data format aligns with Delta Lake specifications. If the issue persists, review the error logs for specific messages or conflicts and consult documentation or community forums for troubleshooting steps.

richbenmintz · ‎06-27-2024

Hi @Naveen-004,

Are you able to share all the Code in your Notebook, Spark is a lazily evaluated there could be issues in earlier commands that are only getting executed when you try to write the dataframe.

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

Naveen-004 · ‎06-27-2024

Thanks for your help in finding the issue.

v-cboorla-msft · ‎06-27-2024

Hi @Naveen-004

Glad that your query got resolved.
Please continue using Fabric Community for further queries.

Thank you.

v-cboorla-msft · ‎06-24-2024

Hi @Naveen-004

Thanks for using Microsoft Fabric Community.

As I understand that you are trying to write data from a DataFrame (df) into a Delta table specified by delta_table_path using PySpark. However, the write operation encountered an error.

The error message indicates that the Spark job failed due to a stage failure, specifically task 106 in stage 2171.0. The task failed 4 times, and the most recent failure was due to an executor loss. The error message also mentions that the container from a bad node (vm-0c596022) exited with a status code of 50. This suggests that there was an issue with the Spark executor running on that node.

Here are some possible reasons that might be causing the error:

Node failure: The Spark executor node (vm-0c596022) might have failed or become unavailable, causing the task to fail. Look at the Spark UI to see if there are any errors or warnings related to the failed task or executor.

Resource constraints: The node might not have had sufficient resources (e.g., memory, CPU) to execute the task, leading to a failure. Try increasing the resources (e.g., memory, CPU) available to the Spark executor nodes.

Network issues: There might have been network connectivity issues between the Spark driver and the executor node, causing the task to fail. Please try after sometime.

Spark configuration: The Spark configuration might be incorrect or suboptimal, leading to executor losses or task failures. Review the Spark configuration to ensure it is correct and optimal for your workload.

Retry the Write Operation: Sometimes, transient issues can cause task failures. Consider retrying the write operation to see if it succeeds on the second attempt.

I hope this information helps.

Thank you.

Naveen-004 · ‎06-24-2024

I checked the Spark history and found that one of the stages was skipped, leading to a failure. Can you assist with troubleshooting the error?

I don't believe resource constraints are the issue. The computer we are using has 8 Spark driver cores, 56GB Spark driver memory, 8 Spark executor cores, and 56GB Spark executor memory.

Dynamically allocate executors: Enabled
Spark executor instances: 9

Spark Properties:
I have set the first two properties based on below community forum
https://community.fabric.microsoft.com/t5/Data-Engineering/Writing-dataframe-to-delta-table-fails-wi...

spark.sql.parquet.datetimeRebaseModeInRead: CORRECTED
spark.sql.parquet.datetimeRebaseModeInWrite: CORRECTED
spark.sql.parquet.vorder.enabled: true
spark.ms.autotune.enabled: false
spark.microsoft.delta.optimizeWrite.enabled: true
spark.microsoft.delta.merge.lowShuffle.enabled: true

I retried the write operation, but it failed with the same error.

v-cboorla-msft · ‎06-27-2024

Hi @Naveen-004

Apologies for the inconvenience.

Please reach out to our support team to gain deeper insights and explore potential solutions. It's highly recommended that you reach out to our support team. Their expertise will be invaluable in suggesting the most appropriate approach.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

Thank you.