Any Thoughts about Manually Killing Executors in a Fabric Notebook.

dbeavon3 — Wed, 03 Dec 2025 18:20:53 GMT

I have a series of cells that are executed in a fabric notebook. At the very end there are some cess that need to use a lot more memory. I frequently find that my executors are being killed by yarn like so (137 exit code)

... upgrading from small to medium nodes didn't quite do the trick either.

One thing that works consistently is to set "spark.task.maxFailures" to a very elevated number like 10 or 100. That basically allows an entire executor to die, and a new one to take its place (one time to achieve success, or as many as it takes to reach the max failures)

I am grateful to have "spark.task.maxFailures" for troubleshooting. However I really don't like this type of trial-and-error programming approach. It feels dirty to me (even for a python -based or notebook -based solution.)

I've tried various things to free up memory like calling "unpersist" on dataframes and what-not. But they don't consistently work. The only thing that consistently works is for the entire executor to get killed and replaced. (Another possible solution would be to split apart the notebook into two separate notebooks, each with their own distinct spark session.)

Since I already know the stage of the notebook where I need to recover all the executor memory, I'm evaluating the use of sparkContext.killExecutors. My notebook uses dynamically allocated executors and I'd just as soon recycle the bad ones myself. (Rather than to rely on "spark.task.maxFailures" to do it for me.)

Has anyone else tried to use the "killExecutors" method, to avoid a persistent error in an executor that won't release its memory? I found the idea here:

Re: Any Thoughts about Manually Killing Executors in a Fabric Notebook.

dbeavon3 — Wed, 03 Dec 2025 20:51:49 GMT

The code I posted earlier was sort of a dead-end so I'll post the solution I found.

I discovered you cannot kill the executors that are created automatically (spark.dynamicAllocation.enabled). So the first step is to configure the notebook to run with manual allocation of executors.
Set spark.dynamicAllocation.enabled and spark.executor.instances:

When you get to the part of the code where you want a fresh executor, you need to use the JVM gateway, to pass commands from python to the real underlying JVM programming platform. Then get executors like so

spark.conf.get("spark.executor.instances") my_jsc = spark._jsc.sc() v003 = my_jsc.getExecutorIds()

The ID's are pretty simple ('1', or '2', or '3', or some combination of those). After you have the list of ID's you can kill them like so:

executorIds = [v003.apply(i) for i in range(v003.length())] for executor_id in executorIds: my_jsc.killExecutor(executor_id)

Finally you can create fresh ones, that will (hopefully) have the same amount of memory as a freshly-started notebook:

my_jsc.requestExecutors(1)

Hopefully this is useful. I think it is unfortunate that it was even necessary in the first place. However I discovered that it is hard to get an executor to release memory. (it is hard enough just to inspect the memory being used in executors, after we start getting 137 errors). The approach here may be a preferred alternative to just trial-and-error approaches, like increasing max failures, or changing memory sizes from small->medium->large.

topic Re: Any Thoughts about Manually Killing Executors in a Fabric Notebook. in Data Engineering

Any Thoughts about Manually Killing Executors in a Fabric Notebook.

Re: Any Thoughts about Manually Killing Executors in a Fabric Notebook.