Re: Spark Pools in Microsoft Fabric: "Autoscale" v...

Krumelur · ‎09-11-2024

What is the purpose of the two settings "Autoscale" and "Dynamically allocate executors" work in Synapse? If I have my pool configured to scale from 1 to 30 nodes, I can set the number of dynamic exeutors to 29 max (number of nodes minus one driver node).

Why are there two settings? I cannot have more executors than nodes, can I? And why would I set a lower number of execturos than I have nodes, effectively underutilizing my available nodes? Maybe an example can help.

Krumelur · ‎09-11-2024

I must have read all our docs about this, including the one you referenced. If you look at the explanations, you'll find that they don't _really_ say what the difference of the two options is and when to apply.

1. The paragraph about "Dynamic allocation" mixes "nodes" and "executors"

2. It does not explain why I would want to set my number of executors _lower_ than the number of nodes un "Autoscale"

3. It does not mention, that setting the number of executors (which really is nothing but a number of nodes) in the pool has absolutely zero effect unless you are not using the pool as a standard pool without a custom environment (I tested this).

Autoscale
Autoscale for Apache Spark pools allows automatic scale up and down of compute resources based on the amount of activity. When you enable the autoscale feature, you set the minimum and maximum number of nodes to scale. When you disable the autoscale feature, the number of nodes set remains fixed. [...]

Dynamic allocation
Dynamic allocation allows the Apache Spark application to request more executors if the tasks exceed the load that current executors can bear. It also releases the executors when the jobs are completed, and if the Spark application is moving to idle state. [...] When you enable the dynamic allocation option for every Spark application submitted, the system reserves executors during the job submission step based on the minimum nodes. You specify maximum nodes to support successful automatic scale scenarios.

Joshrodgers123 · ‎09-12-2024

I agree. I've read the docs too and cannot figure out what the difference is between these 2 settings.

Krumelur · ‎09-12-2024

I ran some test scenarios. See screenshots, too.

According to my analysis, the "problem" is a combination of two factors:

1. Modifying a pool is only possible in the context of configuring the default pool for a workspace (under workspace settings). If you want to create a new pool, no matter if you want to use it as defatul pool or not, you have to go to the default pool drop down and select "New". If you save your new pool, it will become the default pool. If you just created the pool for other use cases, you have to switch back the default pool to "Starter Pool".

2. By default, a workspace does not have an environment set. If you do configure an environment, you select the pool it uses and you can set the number of executors you want. Really, it should say number of nodes, because that is what it means.

So now..

In case you do *not* configure a default environment for the workspace, and have a (custom) default pool configured with dynamic number of nodes and dynamic number of executors the following will happen (let's say pool has dynamic 1 - 20 nodes and dynamic 1 - 5 "executors")

You run a notebook and it will use up to 5 nodes from that default pool. See how the number of nodes matches the number of executors in the pool? It also means, the pool still has 15 nodes left for someone else to use. If someone else runs another notebook,, they will get again 1- 5 nodes out of the remaining 15 nodes. And so on, until the pool's capacity is exceeded.

If you *do* set a default environment for your workspace, the number of executors you define there (dynamic or not) will be what is used. So if your default env says to use 1- 10 nodes (again, missleading that "node" and "executor" is used interchangably here), the first notebook will get 1 - 10 nodes out of the 20 available, leaving 10 for the other users. The dynamic executor/node setting of the *pool* is no longer used.

v-shex-msft · ‎09-11-2024

HI @Krumelur,

>>What is the purpose of the two settings "Autoscale" and "Dynamically allocate executors" work in Synapse?

These two options are used to dynamic allocate resources to auto fit the fact job and activities required resource, so that you not need to frequency modify configure the pool setting to meet the scenarios.

For detail information about these, you can refer to the following document:

Apache Spark compute for Data Engineering and Data Science - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Spark Pools in Microsoft Fabric: "Autoscale" versus "Dynamically allocate executors"

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

New Offer! Become a Certified Fabric Data Engineer

Spark Pools in Microsoft Fabric: "Autoscale" versus "Dynamically allocate executors"

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024