Solved: High Concurrency mode and custom environment - not...

mr_001 · ‎01-08-2025

I have some shared code I want to deploy in custom environment and use from multiple notebooks in a pipeline.

I have enabled high concurrency for Spark in the workspace and set a shared session tag in all of the notebooks. When I use the default environment and starter pool the notebooks all start running quickly and are sharing the same session.

When I create a custom environment that has a WHL file published and set this as the default for the workspace the notebooks no longer use a shared session. All of them take 3 mins to start up. The same is the case when running the notebooks interactively.

According to the docs the notebooks must all use the same spark configuration and have the same library packages, which is the case. Is this a known issue or undocumented limitation?

v-pbandela-msft · ‎01-11-2025

Hi @mr_001,

Thank for reaching out in Microsoft Community Forum.

Custom environments may cause delays and disrupt session sharing, even when high concurrency is enabled.

Please follow below steps to acheive the error;

1. Run a simple job, like a basic Spark command, before starting the pipeline to initialize the cluster and Spark session. This will help reduce the startup time.

2. Use high-concurrency clusters with autoscaling and preload libraries through init scripts or cluster configuration to ensure faster session initialization.

3. Set the same session tag for all notebooks and ensure spark.databricks.session.share is enabled in the Spark configuration for proper session sharing.

4. Check cluster and job logs for delays from library installation or executor setup, and preinstall dependencies or optimize custom libraries to reduce runtime delays.

If you found this post helpful, please consider marking it as "Accept as Solution" and select "Yes" if it was helpful. help other members find it more easily.

Thank you,
Pavan.

View solution in original post

v-pbandela-msft · ‎01-18-2025

Hi @mr_001,

I wanted to follow up since we haven't heard back from you regarding our last response. We hope your issue has been resolved.
If the community member's answer your query, please mark it as "Accept as Solution" and select "Yes" if it was helpful.
If you need any further assistance, feel free to reach out.

Thank you,
Pavan.

v-pbandela-msft · ‎01-11-2025

Hi @mr_001,

Thank for reaching out in Microsoft Community Forum.

Custom environments may cause delays and disrupt session sharing, even when high concurrency is enabled.

Please follow below steps to acheive the error;

1. Run a simple job, like a basic Spark command, before starting the pipeline to initialize the cluster and Spark session. This will help reduce the startup time.

2. Use high-concurrency clusters with autoscaling and preload libraries through init scripts or cluster configuration to ensure faster session initialization.

3. Set the same session tag for all notebooks and ensure spark.databricks.session.share is enabled in the Spark configuration for proper session sharing.

4. Check cluster and job logs for delays from library installation or executor setup, and preinstall dependencies or optimize custom libraries to reduce runtime delays.

If you found this post helpful, please consider marking it as "Accept as Solution" and select "Yes" if it was helpful. help other members find it more easily.

Thank you,
Pavan.

KartikN · ‎01-23-2025

Hi Pavan, I have a similar issue. All of the notebooks are attached to a custom env. I have a simple notebook to start a concurrent session. Followed by few other notebooks configured to use the same session tag but concurrent not always works. Sometimes, it fails with "Failed to create session for executing notebook".

Can you please elobrate your step #3?

3. Set the same session tag for all notebooks and ensure spark.databricks.session.share is enabled in the Spark configuration for proper session sharing.

Also, I understood that if the concurrent session doesn't exist then it should create a new instead of failing. Please share your thougths.

mr_001 · ‎01-08-2025

Further testing shows that this doesn't seem to be related to whether a custom environment is in use. Rather, any time the pipeline activity is invoking a notebook for the first time it seems to create a new session (with a slow execution of 2-3mins), a second activity calling a notebook that was already invoked shares the spark session and executes in 20s.

High Concurrency mode and custom environment - notebook sessions not shared

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

High Concurrency mode and custom environment - notebook sessions not shared

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025