Solved: Spark Environment Properties Set Up Suggestion

ati_puri · ‎11-05-2025

Hi Team,

When we create a custom pool in fabric, we explicitly set up number of nodes(min/max) and enable dynamic scaling for the nodes and executors, however inside our spark environment, we do have a flexibity to add spark properties for bronze/silver/gold layers.

Shall we add min/max or dynamicspark executors properties inside our environment also?

Do these properties inside our custom spark env overrides the executors/nodes set up in custom pool?

For example:

'spark properties like the below in env configuration' vs 'spark setting in workspace in DataEngineering section':

spark.dynamicAllocation.enabled = true
spark.dynamicAllocation.minExecutors = 8
spark.dynamicAllocation.initialExecutors = 12
spark.dynamicAllocation.maxExecutors = 100

spark setting in workspace in DataEngineering section:-

How does setting up dynamic allocators at both the places impact the env configuration?

Thanks.

ati_puri · ‎11-13-2025

Hi Fellow members,

As we progressed ahead with setting up our spark properties, we came to know that spark node counts and auto dynamic scaling is set up while we configure our pool.

While setting up bronze environment, we had set up another set of executors configurations which should alogn with nodes and dynamic scaling setting from starter/custom pool.

These settings are necessary and should be in sync with the number of nodes as per choosen FSKU.

Thanks.

View solution in original post

v-dineshya · ‎11-14-2025

Hi @ati_puri ,

Spark properties applied on, Custom Pool Settings (Data Engineering Workspace --> Pools). It Controls physical resources, number of nodes, dynamic scaling of nodes. These settings define the cluster size and elasticity at the infrastructure level.

Spark Environment Properties, Controls logical Spark configurations for jobs and notebooks. It includes properties like spark.dynamicAllocation.enabled, spark.sql.*, etc. These apply inside the Spark application running on the pool.

Environment properties cannot override node count or pool-level scaling because those are infrastructure-level. Executor-level dynamic allocation (spark.dynamicAllocation.*) works within the limits of the pool.

For example:

Pool--> min nodes = 2, max nodes = 10
Environment--> spark.dynamicAllocation.maxExecutors = 100
Actual max executors will be limited by the pool capacity nodes × cores.

Note: Pool dynamic scaling = adjusts nodes based on workload. Spark dynamic allocation = adjusts executors within the nodes allocated. If both are enabled, Spark will request more executors as needed. Pool will add nodes if executor demand exceeds current capacity. They work together, not override each other.

Please try below things.

1. Enable dynamic allocation in Spark environment for flexibility.

2. Set reasonable min/max executors aligned with pool size and SKU.

For example:

Pool max nodes = 10, each node = 16 cores --> ~160 cores total.
Set spark.dynamicAllocation.maxExecutors ~ 160 / executor cores.

3. Avoid setting unrealistic values like 1000 executors on a small pool.

Note: In your example, It is fine as long as your pool can support it. If pool max nodes = 5, you will never reach 100 executors.

Keep pool settings for node scaling. Use environment settings for executor scaling. Align both with FSKU capacity (nodes × cores × memory). Please document these settings per layer Bronze/Silver/Gold for consistency.

I hope this information helps. Please do let us know if you have any further queries.

Regards,

Dinesh

View solution in original post

BhaveshPatel · ‎11-06-2025

Hi @ati_puri

First thing first, You need to understand the concept of python, Data Lake and Delta Lake.

You can use notebooks alone to achieve this better. For example, I will use notebooks DeltaLakeConfigurations and use
spark.conf.set("spark.microsoft.delta.retentionDurationCheck.enabled", "false")

spark.conf.set("spark.microsoft.delta.properties.defaults.autoOptimize.optimizeWrite","true")

spark.conf.set("spark.microsoft.delta.properties.defaults.autoOptimize.autoCompact", "true")

Also you can use python programming or Spark programming rather than use UI/UX. You can save millions of dollars.

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

BhaveshPatel · ‎11-05-2025

Hi @ati_puri

So here are two properties you should think about.

one is spark properties in Python - spark.conf.set("spark.sql.ansi.enabled", "true"). This properties is set up against notebooks ( Global properties in Environment )

Yes, each pool ( driver and executor- Linux - Virtual Machine ) has their spark properties too. For example, SparkContext in Jupyter Notebook

To start with, You should use min of virtual machine and as the time progresses, we have to add as per new clusters. ( start with medium clusters first )

also to save money, always use job clusters and not high concurrency clusters. It is for ad hoc queries only. Always have to think about Money...

You can use command-line tools, and you can save millions of dollars for the company.

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

ati_puri · ‎11-06-2025

Hi @BhaveshPatel Edited my post above for clear understanding. Pls have a look and let me know, how setting up the same properties at both the place impact environment configuration.

Thanks.

v-dineshya · ‎11-09-2025

Hi @ati_puri ,

Thank you for reaching out to the Microsoft Community Forum.

Could you please try the proposed solutions shared by @BhaveshPatel and @tayloramy ? Let us know if you’re still facing the same issue we’ll be happy to assist you further.

Regards,

Dinesh

ati_puri · ‎11-13-2025

Hi Fellow members,

As we progressed ahead with setting up our spark properties, we came to know that spark node counts and auto dynamic scaling is set up while we configure our pool.

While setting up bronze environment, we had set up another set of executors configurations which should alogn with nodes and dynamic scaling setting from starter/custom pool.

These settings are necessary and should be in sync with the number of nodes as per choosen FSKU.

Thanks.

v-dineshya · ‎11-14-2025

Hi @ati_puri ,

Spark properties applied on, Custom Pool Settings (Data Engineering Workspace --> Pools). It Controls physical resources, number of nodes, dynamic scaling of nodes. These settings define the cluster size and elasticity at the infrastructure level.

Spark Environment Properties, Controls logical Spark configurations for jobs and notebooks. It includes properties like spark.dynamicAllocation.enabled, spark.sql.*, etc. These apply inside the Spark application running on the pool.

Environment properties cannot override node count or pool-level scaling because those are infrastructure-level. Executor-level dynamic allocation (spark.dynamicAllocation.*) works within the limits of the pool.

For example:

Pool--> min nodes = 2, max nodes = 10
Environment--> spark.dynamicAllocation.maxExecutors = 100
Actual max executors will be limited by the pool capacity nodes × cores.

Note: Pool dynamic scaling = adjusts nodes based on workload. Spark dynamic allocation = adjusts executors within the nodes allocated. If both are enabled, Spark will request more executors as needed. Pool will add nodes if executor demand exceeds current capacity. They work together, not override each other.

Please try below things.

1. Enable dynamic allocation in Spark environment for flexibility.

2. Set reasonable min/max executors aligned with pool size and SKU.

For example:

Pool max nodes = 10, each node = 16 cores --> ~160 cores total.
Set spark.dynamicAllocation.maxExecutors ~ 160 / executor cores.

3. Avoid setting unrealistic values like 1000 executors on a small pool.

Note: In your example, It is fine as long as your pool can support it. If pool max nodes = 5, you will never reach 100 executors.

Keep pool settings for node scaling. Use environment settings for executor scaling. Align both with FSKU capacity (nodes × cores × memory). Please document these settings per layer Bronze/Silver/Gold for consistency.

I hope this information helps. Please do let us know if you have any further queries.

Regards,

Dinesh

ati_puri · ‎11-16-2025

@v-dineshya Thanks for the detailed explanation.

v-dineshya · ‎11-12-2025

Hi @ati_puri ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. And, if you have any further query do let us know.

Regards,

Dinesh

tayloramy · ‎11-06-2025

Hi @ati_puri,

THere is no "one setting fits all" solution here. THe settings you need to use depends on what you're running.

Take a look at your jobs that run, if they are only ever using 1 executor, you can reduce the resources you allocate.

If they're always capped out and using 100% of the pool - make it bigger.

Optimization like this is as much art as it is science, fiddle with the dials until you have settings that make you happy and work for your environment.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

Spark Environment Properties Set Up Suggestion

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Spark Environment Properties Set Up Suggestion

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026