Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

Reply
carlossoria
Frequent Visitor

Scheduled pipeline notebook execution fails intermittently

Hi everyone,

 

I'm encountering an intermittent issue with a notebook executed via a scheduled pipeline in Microsoft Fabric. Occasionally, the notebook fails to start execution entirely, and other times it crashes during execution (error with Livy session). As this (and other) notebooks in our pipelines perform writing operations, I can't establish retry logic as it could lead to duplicates.

 

In this specific case, the notebook failed to start, and the pipeline run shows the following error:

Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Exception, Error value - Failed to create session for executing notebook.

This error seems to indicate that the notebook service returned a success status code (200), but internally failed to create a session for execution. Since the notebook never started, no logs are available inside the notebook itself.

Has anyone experienced similar behavior or found a reliable way to handle this kind of failure? 

 

Any insights or suggestions would be greatly appreciated!

Thanks in advance,
Carlos

1 ACCEPTED SOLUTION

Hi Prasanna,
Thanks for following up. The issue hasn’t occurred again. I made a couple of changes to optimize the code in the notebook that was failing most often (although the errors happened in several notebooks, not just that one), and it seems stable now. It’s been running for over a week without any failures.
We can consider the problem resolved for now. If it happens again for any reason, I’ll reach out again to the community.
Best regards,
Carlos

View solution in original post

13 REPLIES 13
v-pgoloju
Community Support
Community Support

Hi @carlossoria,

 

Glad to hear the issue is resolved and everything is running stable now. If you feel one of the replies helped you reach the solution, please consider marking that reply as the Accepted Answer so it can help others with the same issue in the future. And of course, feel free to reach out again if the problem comes back.

 

Thanks & Regards,

Prasanna Kumar

v-pgoloju
Community Support
Community Support

Hi @carlossoria,

 

Just following up to see if the Response provided by community members were helpful in addressing the issue. if the issue still persists Feel free to reach out if you need any further clarification or assistance.

 

Best regards,
Prasanna Kumar

 

Hi Prasanna,
Thanks for following up. The issue hasn’t occurred again. I made a couple of changes to optimize the code in the notebook that was failing most often (although the errors happened in several notebooks, not just that one), and it seems stable now. It’s been running for over a week without any failures.
We can consider the problem resolved for now. If it happens again for any reason, I’ll reach out again to the community.
Best regards,
Carlos

v-pgoloju
Community Support
Community Support

Hi @carlossoria,

 

The pipeline notebook failed because the Livy Spark session wasn’t active at the time of submission. The error message was: requirement failed: Session isn’t active (HTTP 400). This usually happens if the Spark session hasn’t finished initializing, has timed out, or was closed before the job was submitted. Next steps:
Please restart the notebook/pipeline to reinitialize the Spark session.
Ensure the cluster is healthy and the session is in idle or running state before submitting jobs.
If this keeps happening, we may need to add retry/wait logic in the pipeline to check session status before execution.

 

Thanks & Regards,

Prasanna Kumar

KevinChant
Super User
Super User

Yes I have. You need to monitor your capacity to see if limits are being reached by your users. In additin, I recommend check that the notebooks that are being used in the workspaces using the same capacity are being used optimally.

Hello,

 

I've checked the capacity metrics, there are no signs of high usage or saturation during the periods when the notebooks failed. Capacity usage remains well below the maximum limit, and there are no rejection events due to lack of resources. I don't think these notebook errors are capacity related.

 

Thanks.

v-pgoloju
Community Support
Community Support

Hi @carlossoria,

 

Thank you for reaching out to the Microsoft Fabric Forum Community, and special thanks to @Gpop13 and @Ugk161610  for prompt and helpful responses.

Just following up to see if the Response provided by community members were helpful in addressing the issue. if the issue still persists Feel free to reach out if you need any further clarification or assistance.

 

Best regards,
Prasanna Kumar

 

Hi Prasanna,

Thank you for following up.

The issue still persists, although it has been intermittent. On Wednesday and Thursday, all executions ran correctly without any problems. However, on Tuesday we encountered the error described in the opening post, and today (Friday) we experienced a different error during execution:

InvalidHttpRequestToLivy: Submission failed due to error content = ["requirement failed: Session isn't active."]
HTTP status code: 400

This time, the error occurred during the execution phase, unlike the previous case where the session failed to start and the job could not begin.

 

Regards

Ugk161610
Continued Contributor
Continued Contributor

Hi @carlossoria,

Yes, we’ve also come across this issue a few times with scheduled pipeline notebook executions in Fabric. The error usually shows “Notebook execution failed at Notebook service with http status code 200 – Failed to create session for executing notebook.”

In our case, it happened randomly and seemed to be related to temporary backend or session initialization issues in the Fabric notebook service. The pipeline shows as failed, but no logs appear since the session never actually started.

What worked for us was:

  1. Re-running the same pipeline manually — it usually succeeds without any changes.

  2. Making sure the workspace and capacity are active and not in sleep mode before the scheduled run.

  3. Adding a small delay or pre-check activity (for example, a lightweight notebook or REST call) before the main notebook execution, to ensure the session starts properly.

It seems to be an intermittent platform-side issue rather than something in the notebook code itself. Hopefully Microsoft will address this in a future update.

 

Best regards,
Gopi Krishna

Hello and thank you for sharing those insights, Gopi.

 

Regarding option 3, I have a question: since the session initialization is specific to the notebook being executed at that moment, how does running a lightweight notebook first help? Wouldn’t the main notebook still need to start its own session afterwards, which could potentially fail again?

 

I’m asking because, in this case, the failed notebook was not the first one running in the environment — other notebooks had executed successfully before — so it doesn’t seem to be a cold start issue then.

 

Thanks again!

Hi @carlossoria,

The idea behind adding a lightweight or pre-check notebook isn’t to fix the session for the main notebook directly, but to make sure the Fabric capacity and Spark environment are fully warmed up and responsive before the main job runs.

In our case, the failures mostly happened when the environment had been idle for a while, and the first heavy notebook in the schedule couldn’t create a session properly — it failed at the initialization step. By triggering a small “warm-up” notebook (something that just reads a small table or executes a simple cell), the compute session starts up the Spark engine and resources, so the next notebook’s session initializes faster and more reliably.

If other notebooks are already running successfully in your environment, then yes — it might not be a cold-start issue in your case. It could instead be related to transient session creation limits or capacity-level resource contention at the exact trigger time.

We saw that re-running manually almost always succeeded, which points to this being more of a temporary backend or session allocation glitch rather than a notebook-specific problem.

Hopefully Microsoft will address this soon — since it seems to affect scheduled runs more than manual ones.

 

Best regards,
Gopi Krishna

Gpop13
Advocate IV
Advocate IV

Hi @carlossoria , is there high concurrency ON for the notebooks? It may happen so that if some other notebooks are running using high concurrency session, this notebook may not have had sufficient resources to start execution. May be altering the spark cluster settings could help?

also usually retry option helps but you mentioned it's not viable as it could cause duplicates. I am not aware of your architecture but if the notebook has not run then why there could be duplicates if retried?

 

 

Hello! I'm going to answer directly on the reply you sent:

 

Hi @carlossoria , is there high concurrency ON for the notebooks? No, none of the notebooks have high concurrency on, all of them use standard session mode. It may happen so that if some other notebooks are running using high concurrency session, this notebook may not have had sufficient resources to start execution. May be altering the spark cluster settings could help?

also usually retry option helps but you mentioned it's not viable as it could cause duplicates. I am not aware of your architecture but if the notebook has not run then why there could be duplicates if retried? It's not viable because there are some cases in which this error happens during execution (not this particular case, but the log error is more or less the same). Then, if I establish retries in a notebook that performs writing operations on some tables and the notebook starts executing, I cannot guarantee that any writing has not been done prior to the session error.

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors