Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
r_Gao
Helper I
Helper I

Container exited with a non-zero exit code 137

 

Hi I have a Delta table with 252,322,508 rows. The data contains some duplicates. I have a merge statement that deletes these duplicates (I can easily identify them with a query and it's around 500k duplicate rows). I have tried liquid clustering, partitioning on year and month columns but each time i run a merge command along the lines of:

 

 

 

delete_duplicates_sql = f"""
MERGE INTO delta.`{target_table_path}` AS target
USING (
SELECT * FROM RankedRowsToDelete
) AS source
ON source.{target_id} = target.{target_id} AND {watermark_join_on_expression} AND COALESCE(CAST(source.{layer}_pipeline_insert_date AS TIMESTAMP), '1970-01-01 00:00:00') = COALESCE(CAST(target.{layer}_pipeline_insert_date AS TIMESTAMP), '1970-01-01 00:00:00') AND (target.year = year AND target.month = month)
WHEN MATCHED THEN DELETE
"""

 

 




I get a Container exited with a non-zero exit code 137 after about 20 or so minutes. This error code seems to imply some memory issue.

 

 

Py4JJavaError: An error occurred while calling o358.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 44.0 failed 4 times, most recent failure: Lost task 5.3 in stage 44.0 (TID 5891) (vm executor  ExecutorLostFailure (executor 7 exited caused by one of the running tasks) Reason: Container from a bad node: container on host: vm-. Exit status: 137. Diagnostics: [2024-12-29 23:55:12.171]Container killed on request. Exit code is 137
[2024-12-29 23:55:12.203]Container exited with a non-zero exit code 137. 
[2024-12-29 23:55:12.212]Killed by external signal

 

 


I've tried modifying the workspace environment going from 4 executor small nodes to 10 executor medium nodes and this does not solve the issue either. Does anyone have any recommendations

 

1 REPLY 1
FelixL
Advocate II
Advocate II

Were you able to get this issue fixed? I am experiencing the exact same issue. A lot of executors failing with error code 137. I am migrating jobs currently running fine (daily, never once crashing) from Azure Synapse into Fabric. I am using identical pool sizes, but even so - the fabric jobs are crashing left and right.

 

Even when doubling the spark pool size (going from 3x small nodes to 3x medium nodes) I am seeing similar executor failures. Sometimes the jobs manage to finish, sometimes they pull the livy session down with them and the entire application fails. 

 

Monitoring the spark application memory usage while executing, the executors are only satuated to around 50% memory usage when they die. They do however almost always die when fully utilized on CPU... 

 

I have tried everything; disabled persisting of dataframes, increased overhead memory on executors, ... But no change; Fabric just cant keep my simple jobs alive. And they are simple; reading from delta, saving to delta, working with 100MB-4GB delta tables. This can be run on a potato, but apparently not in Fabric.. 

 

Note: I am not using Native Execution engine, because that **bleep** brings it own case of issues to the party. Gluten exploding in my face at every turn.. So this should be as close to 1:1 to Azure Synapse as it gets, I would think.. 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.