Solved: Re: Delta ConcurrentAppendException when loading d...

FelixL · ‎02-19-2025

I am facing an issue when running multiple concurrent loads into a Fabric Lakehouse, that doesnt really make sense to me. I have developed a PySpark framework for running my loasds in parallell, utilizing the notebookutils runMultiple function. This means that I fire off multiple jobs concurrently from the same session. Each job then loads separate tables, using some variations of Merge, Overwrite and Replace functionality.

The problem I am facing is that I get "ConcurrentAppendException" errors whenever jobs time it so that they load their respective tables within seconds of each other. Note: they never load the same tables, but they all load to tables within the same lakehouse.

I would assume that "ConcurrentAppendException" should not be raised when I am not running any concurrent appends to the same tables. The error messages I see also point to other tables than the one actually being loaded, which makes me belive that there is something fishy going on.

Example below...

Job 1 is loading table: gold_sales_dimension_customervisittypeno (success)

Job 2 is loading table: gold_sales_dimension_planogramno (error)

As highlighted in the below error message from job 2, it fails due to concurrent append, and refers to the "target table" of job 1.

Error message from job 2 (i.e. the job loading table gold_sales_dimension_planogramno)

Error message: [DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
 
Conflicting commit: {"timestamp":1739965911476,"operation":"UPDATE","operationParameters":{"predicate":["(TargetTable#642166 = gold_sales_dimension_customervisittypeno)"]},"readVersion":1535,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"1585","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"0","numAddedChangeFiles":"0","executionTimeMs":"2083","numDeletionVectorsUpdated":"0","scanTimeMs":"1541","numAddedFiles":"1","numUpdatedRows":"1","numAddedBytes":"1585","rewriteTimeMs":"541"},"tags":{"VORDER":"true"},"engineInfo":"Apache-Spark/3.5.1.5.4.20241007.4 Delta-Lake/3.2.0.5","txnId":"dc53e62a-437f-47ef-ac62-dddb5850fdd7"}
Refer to xttps://docs.delta.io/latest/concurrency-control.html for more details.

The entire framework is runnign fine in Azure Synapse Analytics, and has for years. I have never seen issues like these there... And the jobs do not (sohuld not) touch each others tables, ever. So I dont understand where this comes from.

One of my scheduled jobs, loading to some 80 unique tables, easily get 5-6 failed loads due to concurrent appends like above.

Anyone else seen anything like this?

FelixL · ‎02-22-2025

I am an idiot. After further debugging i found the issue, and it was all on my side. So Fabric/Lakehouse works as intended in this case. Thank you for helping me, and sorry for wasting your time. :'(

(the conflict appears in a shared/log table, that I did not think of being a potential culprit..)

View solution in original post

V-yubandi-msft · ‎03-01-2025

Hello @FelixL ,

we wanted to check in as we haven't heard back from you. Did our solution work for you? If you need any more help, please don't hesitate to ask. Your feedback is very important to us. We hope to hear from you soon.

Thank You.

Fabamik · ‎02-26-2025

I am getting similar error "ConcurrentAppendException" when trying to update delta tables from lakehouse using parallel operations. Queries from parallel operations very simple like "update table set status = {value} where id ={id}" - where id is different in every operation. But still it is failing with error ConcurrentAppendException. I am loogin through various forums and found suggestions like -

1. retry mechanism - this is not feasible as on every transaction I can't add this code

2. batch updates - not in scope as part of parallely executing pipeline I am running these notebooks with update queries

3. Partitioning - for partitioning id column is primary key column so it will create lot of files

4. Isolation level - This is not sure but I think, In Microsoft Fabric, Delta tables in the Lakehouse use Snapshot Isolation by default. But I haven't found any way to change this to serializable.

5. In delta.io they are indicating to use more specific filter as per partition but in my case I don't have much data in table and also no other column to make it specific filter. As I am directly doing it on primary key

Seems like very basic requirement in Data engineering but struggling to achieve it in Microsoft Fabric. Do we have any solution on this?

V-yubandi-msft · ‎02-25-2025

Hi @FelixL ,

@nilendraFabric , has provided an exact solution to your issue. Could you kindly confirm whether your problem has been resolved or if you are still encountering any difficulties? Your feedback is valuable to the community as it assists others who might face a similar issue.

Thank you, @nilendraFabric , for your valuable insights.

If the issue is resolved, please mark it As the Accepted solution to help others in the community find the answer more easily.

Regards,

Yugandhar.

nilendraFabric · ‎02-19-2025

Hello @FelixL

Fabric Lakehouse uses a queuing system where concurrency depends on the capacity SKU (e.g., F2 allows 1 concurrent job; F32 allows 8).

If your capacity tier is too low, concurrent writes to any table in the Lakehouse may trigger conflicts, as jobs compete for limited resources

For 10 concurrent jobs, use at least F16 (burst factor 3 → 32×3 = 96 cores).

Upgrade SKUs to match workload parallelism requirements using Microsoft’s formula

Required SKU Tier = Ceil(Total Concurrent Jobs / Burst Factor)

https://learn.microsoft.com/en-us/fabric/data-engineering/spark-job-concurrency-and-queueing

Try partition by natural keys to prevent overlapping file operations

FelixL · ‎02-19-2025

Thanks for your reply, but I think we are talking sbout different things here..?

I am running one single spark session, using 12 vcores, in a F64/P1 capacity. This one small session then writes to 80 tables in my lakehouse, using a "run multiple" notebooks concurrency of 3.

I should have enormous compute overhead - and I am not seeing any concurrency throttling or job queuing.

What I am seeing are error messages saying that jobs try to write to delta tables they are not actually touching. This either needs a good explanation, or appears to be some bug in the lakehouse-onelake sync layer..?

nilendraFabric · ‎02-19-2025

Makes sesnse now, its not a capacity issue. Let me dig down further. My gut feeling says it is related to , but let me come back on this

"isolationLevel":"Serializable"

FelixL · ‎02-22-2025

I am an idiot. After further debugging i found the issue, and it was all on my side. So Fabric/Lakehouse works as intended in this case. Thank you for helping me, and sorry for wasting your time. :'(

(the conflict appears in a shared/log table, that I did not think of being a potential culprit..)

nilendraFabric · ‎02-22-2025

Thanks @FelixL for sharing the findings. Happy to help anytime

Delta ConcurrentAppendException when loading different tables in Lakehouse, but at the same time

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025

Party with Power BI’s own Guy in a Cube

Delta ConcurrentAppendException when loading different tables in Lakehouse, but at the same time

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025