Solved: Re: Notebook has 98% idle time in pipeline

P_work · ‎07-07-2025

Notebook in pipeline with custom environment (pre-loaded libraries, 8 driver cores, 8 exe cores, dynamic allocation)

Monitor for notebook stats:

Spark resource usage
Total duration: roughly 5 min
Total idle time: roughly 4min 50 sec
Efficiency : roughly 2pct
The graph shows entire 4min+ of idle until notebook starts.

What method could be used to reduce startup of the notebook in the pipeline? (This idleness does not occur when running notebook outside of pipeline.)

This notebook is used in a child pipeline that is called within a parent pipeline for-loop with each child pipeline call showing roughly 6 min.
What method could be used to reduce pipeline times in parent's for-loop after initial 5min wait for first child pipeline's notebook?

v-echaithra · ‎07-21-2025

HI @P_work ,

When the same notebook behaves differently inside and outside the pipeline. That’s a classic case of orchestration overhead and environment provisioning latency.
Even though the custom environment seems lightweight, the pipeline engine treats it as a separate provisioning task. That’s why you see the 4-minute delay, it's not just the notebook, it's the orchestration and environment spin-up
Configure your pipeline to use pre-warmed clusters or instance pools. This avoids cold starts and can shave minutes off startup time.
Instead of calling child pipelines in a loop, use APIs like dbutils.notebook.run() (Databricks) or mssparkutils.notebook.run() (Microsoft Fabric). These maintain context and reduce orchestration delays.

View solution in original post

v-echaithra · ‎08-01-2025

Hi @P_work ,

We’d like to follow up regarding the recent concern. Kindly confirm whether the issue has been resolved, or if further assistance is still required. We are available to support you and are committed to helping you reach a resolution.

Thank you for your patience and look forward to hearing from you.
Best Regards,
Chaithra E.

v-echaithra · ‎07-28-2025

Hi @P_work ,

We wanted to follow up to see if the issue you reported has been fully resolved. If you still have any concerns or need additional support, please don’t hesitate to let us know, we’re here to help.

We truly appreciate your patience and look forward to assisting you further if needed.

Warm regards,
Chaithra E.

v-echaithra · ‎07-21-2025

HI @P_work ,

When the same notebook behaves differently inside and outside the pipeline. That’s a classic case of orchestration overhead and environment provisioning latency.
Even though the custom environment seems lightweight, the pipeline engine treats it as a separate provisioning task. That’s why you see the 4-minute delay, it's not just the notebook, it's the orchestration and environment spin-up
Configure your pipeline to use pre-warmed clusters or instance pools. This avoids cold starts and can shave minutes off startup time.
Instead of calling child pipelines in a loop, use APIs like dbutils.notebook.run() (Databricks) or mssparkutils.notebook.run() (Microsoft Fabric). These maintain context and reduce orchestration delays.

v-echaithra · ‎07-21-2025

Hi @P_work ,

We just wanted to follow up to ensure that your issue has been fully resolved. If there are any outstanding questions or if you need further assistance with anything, please don’t hesitate to reach out. We’re always here and happy to help in any way we can.

Your satisfaction is important to us, and we want to make sure you have everything you need.

Warm regards,
Chaithra E.

P_work · ‎07-21-2025

I believe my issue is not idle time, but the startup time of custom environments. Default environment startup is roughly 20 secs, custom environment (appears the same as the default with one public library loaded/published) 2 minutes notebook standalone, 4 minutes notebook in pipeline.

v-echaithra · ‎07-17-2025

Hi @P_work ,

Just checking in to confirm that your issue has been resolved. If you have any remaining questions or need additional assistance, feel free to reach out, we're always here to help.

Best Regards,
Chaithra E.

v-echaithra · ‎07-14-2025

Hi @P_work ,

Once your notebook has completed its run, explicitly stopping the Spark session (using spark.stop()) ensures resources are freed promptly. You can set a shorter timeout for idle Spark sessions via the Spark config [spark.databricks.cluster.profile.serverless.idleTimeout or equivalent], instead of relying on the default (often 20 minutes).
When setting up your pipeline, enable cluster reuse across pipeline tasks to avoid launching a fresh cluster for each notebook or job. alternatively, use pre-warmed instance pools to reduce cluster start times especially in job cluster mode.
Consider executing child notebooks directly within a single pipeline, using a notebook task group or API like mssparkutils.notebook.run() (on Azure Synapse/Microsoft Fabric) or dbutils.notebook.run() (on Databricks). This maintains context, avoids orchestration overhead, and allows better Spark session sharing.
I hope this helps.

Best Regards,
Chaithra E.

v-echaithra · ‎07-11-2025

Hi @P_work ,

We would like to confirm if you've successfully resolved this issue or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.

Thank you for your patience and look forward to hearing from you.
Best Regards,
Chaithra E.

Srisakthi · ‎07-07-2025

Hi @P_work ,

Is your question on reducing start time of spark session for each notebooks(parent and child) or to reduce idle time?

1. You could leverage High concurrency session , so that your child notebook can utilise the same spark session as parent and it can avoid separate start time for child notebooks

2. You can stop the session by writing code once your notebook job has completed run , it helps in less consumption of CUs. Also you can mention timeout for session in spark settings as default is 20 mins. Wherther your notebook is running for the entire 20 mins or not it will consume CUs. So it is good to mention in spark settings or adding a code to manually stop once the notebook execution is over.

Regards,

Srisakthi

BhaveshPatel · ‎07-07-2025

There is no workaround for Spark environment ( Currently, it is taking expected time ~ 4 mins). By the way, Gradually, you can see the less time to compute the resources. ( Databricks implemented this Delta Lake as per the Google). For reduce the time, You should use programming language of Scala(Spark) / Python.

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

P_work · ‎07-11-2025

Default environment does NOT take 4 mins.

lbendlin · ‎07-07-2025

You could keep the Spark session alive but that will cost you dearly in CUs.

5 minutes seems to be the sweet/sour spot for the decision to keep it running or not.

Notebook has 98% idle time in pipeline

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Notebook has 98% idle time in pipeline

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025