<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Notebook delta table merge operation error: Cannot seek after EOF in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4349329#M5884</link>
    <description>&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/732249"&gt;@ntimmerman&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Did the above suggestions help with your scenario? if that is the case, you can consider Kudo or Accept the helpful suggestions to help others who faced similar requirements.&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Regards,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Xiaoxin Sheng&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jan 2025 06:27:01 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2025-01-03T06:27:01Z</dc:date>
    <item>
      <title>Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4292048#M5259</link>
      <description>&lt;P&gt;Hello Community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For one of our data ingestion jobs, I am using a notebook to read MySQL data and I am executing a merge command to merge the source data with already existing data in the delta table. For some reason, I sometimes get the error message below.&lt;/P&gt;&lt;P&gt;I figured out that if I perform table maintenance on that table. the problem is solved. However, since I have quite a few tables this is difficult to do. Also, it requires manual intervention almost every day which is not ideal.&lt;/P&gt;&lt;P&gt;Any clue what's causing this error?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Py4JJavaError, Error value - An error occurred while calling o6742.execute.&lt;/DIV&gt;&lt;DIV&gt;: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 49.0 failed 4 times, most recent failure: Lost task 1.3 in stage 49.0 (TID 411) (vm-0fc91660 executor 1): java.io.EOFException: Cannot seek after EOF&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:354)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.util.H1SeekableInputStream.seek(H1SeekableInputStream.java:46)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:555)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.&amp;lt;init&amp;gt;(ParquetFileReader.java:799)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:263)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:252)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:316)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:614)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:393)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:900)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:900)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.Task.run(Task.scala:141)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.lang.Thread.run(Thread.java:829)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Driver stacktrace:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2935)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2871)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2870)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2870)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1264)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1264)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at scala.Option.foreach(Option.scala:407)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1264)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3139)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3073)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3062)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1000)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:2563)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:2584)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:2603)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:2628)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1056)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:411)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.collect(RDD.scala:1055)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:460)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:140)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:243)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:238)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at java.base/java.lang.Thread.run(Thread.java:829)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;Caused by: java.io.EOFException: Cannot seek after EOF&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:354)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.util.H1SeekableInputStream.seek(H1SeekableInputStream.java:46)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:555)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.&amp;lt;init&amp;gt;(ParquetFileReader.java:799)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:263)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:252)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:316)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:614)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:393)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:900)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:900)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.scheduler.Task.run(Task.scala:141)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 20 Nov 2024 00:43:26 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4292048#M5259</guid>
      <dc:creator>ntimmerman</dc:creator>
      <dc:date>2024-11-20T00:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4293774#M5285</link>
      <description>&lt;P&gt;I don't know what's causing it, but I had exactly the same issue, and solved it in exactly the same way.&amp;nbsp; Fortunately, it hasn't failed since.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2024 16:53:29 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4293774#M5285</guid>
      <dc:creator>Liam_McCauley</dc:creator>
      <dc:date>2024-11-20T16:53:29Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4293826#M5286</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Does it work if you automate the table maintenance?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;E.g. by scheduling a Notebook to run OPTIMIZE on the tables at a pre-defined interval?&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2024 17:35:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4293826#M5286</guid>
      <dc:creator>frithjof_v</dc:creator>
      <dc:date>2024-11-20T17:35:22Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4294213#M5289</link>
      <description>&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Hi &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/732249"&gt;@ntimmerman&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Can you please share some more detail information about this issue? They should help us clarify your scenario and test to troubleshoot.&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;&lt;A href="http://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/38490" target="_blank"&gt;How to Get Your Question Answered Quickly&amp;nbsp;&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Regards,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Xiaoxin Sheng&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2024 01:42:32 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4294213#M5289</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-21T01:42:32Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4298820#M5331</link>
      <description>&lt;P&gt;Hi&amp;nbsp;@Anonymous&lt;/a&gt;&amp;nbsp;, Sure! Here's the full error log which also contains the code that I'm executing.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Actually it's just a very simple merge operation with only maybe 10 records or so. The problem is that it's just very unreliable. Sometimes I get this error message, other times it's running fine. Here are the error details. Let me know if you have any other questions:&lt;/P&gt;&lt;P&gt;Cell In[13], line 202, in load_duty_time_data(df_programs)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 199 if(df_duty_time.count() &amp;gt; 0):&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 200 &amp;nbsp; &amp;nbsp; #merge data frame. don't worry about deleting as deleting is disabled in FS via a permission&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 201 &amp;nbsp; &amp;nbsp; print(f"""merging {df_duty_time.count()} staff_duty_time records into delta table""")&lt;BR /&gt;--&amp;gt; 202 &amp;nbsp; &amp;nbsp; DeltaTable.forPath(spark,'Tables/staff_duty_time').alias('target').merge(df_duty_time.alias('source'),"source.id = target.id and source.program_id = target.program_id").whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()&lt;/P&gt;&lt;P&gt;File /usr/hdp/current/spark3-client/jars/delta-spark_2.12-3.2.0.5.jar/delta/tables.py:1065, in DeltaMergeBuilder.execute(self)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1058 @since(0.4) &amp;nbsp;# type: ignore[arg-type]&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1059 def execute(self) -&amp;gt; None:&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1060 &amp;nbsp; &amp;nbsp; """&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1061 &amp;nbsp; &amp;nbsp; Execute the merge operation based on the built matched and not matched actions.&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1062&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1063 &amp;nbsp; &amp;nbsp; See :py:class:`~delta.tables.DeltaMergeBuilder` for complete usage details.&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1064 &amp;nbsp; &amp;nbsp; """&lt;BR /&gt;-&amp;gt; 1065 &amp;nbsp; &amp;nbsp; self._jbuilder.execute()&lt;/P&gt;&lt;P&gt;File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1316 command = proto.CALL_COMMAND_NAME +\&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1317 &amp;nbsp; &amp;nbsp; self.command_header +\&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1318 &amp;nbsp; &amp;nbsp; args_command +\&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1319 &amp;nbsp; &amp;nbsp; proto.END_COMMAND_PART&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1321 answer = self.gateway_client.send_command(command)&lt;BR /&gt;-&amp;gt; 1322 return_value = get_return_value(&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1323 &amp;nbsp; &amp;nbsp; answer, self.gateway_client, self.target_id, self.name)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1325 for temp_arg in temp_args:&lt;BR /&gt;&amp;nbsp; &amp;nbsp;1326 &amp;nbsp; &amp;nbsp; if hasattr(temp_arg, "_detach"):&lt;/P&gt;&lt;P&gt;File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:179, in capture_sql_exception.&amp;lt;locals&amp;gt;.deco(*a, **kw)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 177 def deco(*a: Any, **kw: Any) -&amp;gt; Any:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 178 &amp;nbsp; &amp;nbsp; try:&lt;BR /&gt;--&amp;gt; 179 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return f(*a, **kw)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 180 &amp;nbsp; &amp;nbsp; except Py4JJavaError as e:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 181 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; converted = convert_exception(e.java_exception)&lt;/P&gt;&lt;P&gt;File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 325 if answer[1] == REFERENCE_TYPE:&lt;BR /&gt;--&amp;gt; 326 &amp;nbsp; &amp;nbsp; raise Py4JJavaError(&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 327 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "An error occurred while calling {0}{1}{2}.\n".&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 328 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; format(target_id, ".", name), value)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 329 else:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 330 &amp;nbsp; &amp;nbsp; raise Py4JError(&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 331 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".&lt;BR /&gt;&amp;nbsp; &amp;nbsp; 332 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; format(target_id, ".", name, value))&lt;/P&gt;&lt;P&gt;Py4JJavaError: An error occurred while calling o7542.execute.&lt;BR /&gt;: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 371.0 failed 4 times, most recent failure: Lost task 1.3 in stage 371.0 (TID 2251) (vm-dad93454 executor 1): org.apache.spark.SparkException: Encountered error while reading file abfss://8c604c9b-3d05-46f3-a7a3-9fa42debcf1e@onelake.dfs.fabric.microsoft.com/f073b7a3-2275-4cc7-b1f1-8431a680313f/Tables/staff_duty_time/part-00000-e7ae0d9c-be99-4604-bc51-32401f102d06-c000.snappy.parquet. Details:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotReadFilesError(QueryExecutionErrors.scala:863)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:339)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:614)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:393)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:900)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:900)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.Task.run(Task.scala:141)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.lang.Thread.run(Thread.java:829)&lt;BR /&gt;Caused by: java.io.EOFException: Cannot seek after EOF&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:354)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.util.H1SeekableInputStream.seek(H1SeekableInputStream.java:46)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:555)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.&amp;lt;init&amp;gt;(ParquetFileReader.java:799)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:263)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:252)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:316)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:329)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ... 23 more&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2935)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2871)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2870)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2870)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1264)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1264)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at scala.Option.foreach(Option.scala:407)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1264)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3139)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3073)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3062)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1000)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.SparkContext.runJob(SparkContext.scala:2563)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.SparkContext.runJob(SparkContext.scala:2584)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.SparkContext.runJob(SparkContext.scala:2603)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.SparkContext.runJob(SparkContext.scala:2628)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1056)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.withScope(RDD.scala:411)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.collect(RDD.scala:1055)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:460)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:140)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:243)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:238)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at java.base/java.lang.Thread.run(Thread.java:829)&lt;BR /&gt;Caused by: org.apache.spark.SparkException: Encountered error while reading file abfss://8c604c9b-3d05-46f3-a7a3-9fa42debcf1e@onelake.dfs.fabric.microsoft.com/f073b7a3-2275-4cc7-b1f1-8431a680313f/Tables/staff_duty_time/part-00000-e7ae0d9c-be99-4604-bc51-32401f102d06-c000.snappy.parquet. Details:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotReadFilesError(QueryExecutionErrors.scala:863)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:339)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:614)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:393)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:900)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:900)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.scheduler.Task.run(Task.scala:141)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ... 3 more&lt;BR /&gt;Caused by: java.io.EOFException: Cannot seek after EOF&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:354)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.util.H1SeekableInputStream.seek(H1SeekableInputStream.java:46)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:555)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.&amp;lt;init&amp;gt;(ParquetFileReader.java:799)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:263)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:252)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:316)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:162)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:329)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ... 23 more&lt;/P&gt;&lt;P&gt;ERROR StatusConsoleListener Could not close org.apache.log4j.helpers.QuietWriter@71a0293c&lt;BR /&gt;&amp;nbsp;java.io.IOException: java.lang.InterruptedException&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.shrinkWriteOperationQueue(AbfsOutputStream.java:709)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.uploadBlockAsync(AbfsOutputStream.java:361)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.uploadCurrentBlock(AbfsOutputStream.java:287)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushInternal(AbfsOutputStream.java:564)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.hflush(AbfsOutputStream.java:485)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:136)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; at org.apache.spark.microsoft.tools.logging.util.SecureStoreOutputStream.flush(SecureStoreOutputStream.java:56)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2024 00:22:46 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4298820#M5331</guid>
      <dc:creator>ntimmerman</dc:creator>
      <dc:date>2024-11-25T00:22:46Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4300740#M5345</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/437984"&gt;@frithjof_v&lt;/a&gt;&amp;nbsp;, Thanks for the suggestion. Yes, that's a good idea. I've implemented a table maintenance notebook that does this. However, it's difficult to determine when an optimize needs to happen. I had now done it when the last optimize was more than 7 days ago. however, sometimes I run into this issue less than a day after the previous time. So, do I need to schedule the table maintenance twice a day? That sounds like a lot of overhead and just not right. It's pretty frustrating to open the monitor every morning and find that jobs have failed again because of this issue...&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2024 01:22:38 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4300740#M5345</guid>
      <dc:creator>ntimmerman</dc:creator>
      <dc:date>2024-11-26T01:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301340#M5350</link>
      <description>&lt;P&gt;To be honest I don't have too much practical experience with it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How often is the merge operation running?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(I'm wondering how often data is added/changed in your delta table)&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2024 06:11:05 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301340#M5350</guid>
      <dc:creator>frithjof_v</dc:creator>
      <dc:date>2024-11-26T06:11:05Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301686#M5356</link>
      <description>&lt;P&gt;Thank you &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/437984"&gt;@frithjof_v&lt;/a&gt;&amp;nbsp;.&lt;/P&gt;&lt;P&gt;This notebook runs every 3 hours. So, that's 8 times per day.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2024 08:23:58 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301686#M5356</guid>
      <dc:creator>ntimmerman</dc:creator>
      <dc:date>2024-11-26T08:23:58Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301724#M5358</link>
      <description>&lt;P&gt;I'm surprised you get that error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In general, the more often you update the data, especially inserts, it makes sense to regularly optimize the table to rearrange (compact) the underlying parquet files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But only updating the data 8 times a day, I would assume running optimize 1 time daily would be sufficient to get good performance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you got any measurements on how costly the optimize operation is? If it's cheap, it doesn't hurt to run it more often. But I would expect running optimize 1 time daily to be more than enough when you update data 8 times daily. So I think it's strange to get an error in this case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you got partitions on the table (this could cause some performance issues especially if you have partitions on high-cardinality columns).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Btw, the error message mentions stage failure. I have no idea what it means. But would it help to first land the new data in a staging delta table, and then merge the staged data into the destination table? Although that wouldn't explain why running maintenance on the destination table seems to help in your case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not sure what the underlying issue here really is.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2024 08:51:44 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4301724#M5358</guid>
      <dc:creator>frithjof_v</dc:creator>
      <dc:date>2024-11-26T08:51:44Z</dc:date>
    </item>
    <item>
      <title>Re: Notebook delta table merge operation error: Cannot seek after EOF</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4349329#M5884</link>
      <description>&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/732249"&gt;@ntimmerman&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Did the above suggestions help with your scenario? if that is the case, you can consider Kudo or Accept the helpful suggestions to help others who faced similar requirements.&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Regards,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: tahoma; font-size: 11.0pt;"&gt;Xiaoxin Sheng&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 06:27:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Notebook-delta-table-merge-operation-error-Cannot-seek-after-EOF/m-p/4349329#M5884</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2025-01-03T06:27:01Z</dc:date>
    </item>
  </channel>
</rss>

