Solved: pipeline slower than notebook

rgsalido · ‎12-04-2025

Hi everyone,

I have a notebook with several cells that runs very fast, in two minutes or less. When programmed with a pipeline, the duration can reach up to 5 minutes. Do you know what this could be due to and how to solve it?

v-sgandrathi · ‎12-10-2025

Hi @rgsalido,

Thank you for the update.

The behavior you're experiencing is normal when running a notebook through a pipeline. Pipelines typically start a new Spark session for each run, which adds extra time compared to running the notebook manually. Because your pipeline runs every 5 minutes, session startup is likely causing most of the delay.

Even with a session tag applied, Spark may still start a new session if the previous one isn't active or if the compute resources are busy.

To improve performance, you can try these steps:

Use a consistent session tag in the Notebook activity so Fabric can reuse the Spark session when possible.

Enable high-concurrency or session sharing for pipeline notebooks, if your workspace supports it. This helps the pipeline connect to an existing Spark application instead of starting a new one.

Check your Spark pool capacity. If other jobs are using the pool, session startup may be slower because executors aren't available right away.

Review the Spark UI timeline for idle periods, which often show Spark waiting for resources, shuffle, or I/O, rather than issues in your code.

If your pipeline needs to run frequently, consider keeping a warm session active with the same tag so notebook runs can attach to it faster.

Thank you.

View solution in original post

v-sgandrathi · ‎12-17-2025

HI @rgsalido,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Thank you.

v-sgandrathi · ‎12-13-2025

Hi @rgsalido,

I wanted to follow up on our previous suggestions regarding the issue. We would love to hear back from you to ensure we can assist you further.

Thank you.

v-sgandrathi · ‎12-10-2025

Hi @rgsalido,

Thank you for the update.

The behavior you're experiencing is normal when running a notebook through a pipeline. Pipelines typically start a new Spark session for each run, which adds extra time compared to running the notebook manually. Because your pipeline runs every 5 minutes, session startup is likely causing most of the delay.

Even with a session tag applied, Spark may still start a new session if the previous one isn't active or if the compute resources are busy.

To improve performance, you can try these steps:

Use a consistent session tag in the Notebook activity so Fabric can reuse the Spark session when possible.

Enable high-concurrency or session sharing for pipeline notebooks, if your workspace supports it. This helps the pipeline connect to an existing Spark application instead of starting a new one.

Check your Spark pool capacity. If other jobs are using the pool, session startup may be slower because executors aren't available right away.

Review the Spark UI timeline for idle periods, which often show Spark waiting for resources, shuffle, or I/O, rather than issues in your code.

If your pipeline needs to run frequently, consider keeping a warm session active with the same tag so notebook runs can attach to it faster.

Thank you.

rgsalido · ‎12-09-2025

I don't have any more notebooks running in that pipeline. The pipeline runs every 5 minutes. Maybe the pipeline+notebook pattern is not the most efficient.

Ideally, the Spark session should be reused for each execution. I have used the tag in the notebook activity, but it doesn't always use the same session.

BhaveshPatel · ‎12-07-2025

Hi @rgsalido

Yes you are right. Running Notebooks is faster compared to pipelines. pipelines is slower because it runs on UI/UX whereas Notebooks is faster because it is programmed by Python/Scala and we have to use Apache Spark and Delta Lake combined.

( https://spark.apache.org/docs/latest/api/python/getting_started/index.html )

( https://docs.delta.io/)

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

v-sgandrathi · ‎12-07-2025

Hi @rgsalido,

Thank you @Ugk161610 @Srisakthi and @Gpop13 for your replies.

As we have not received a response from you yet, I would like to confirm whether you have successfully resolved the issue or if you require further assistance.

Thank you for your cooperation. Have a great day.

Srisakthi · ‎12-04-2025

Hi @Ugk161610 ,

I agree spark session start time is where we are seeing idle in the image at the begining of the job, but I'm little confused on these

at 12:13:00 job started running and again from 12:14:00 to 12:14:40 it shows idle. Any idea why its been idle?

Regards,

Srisakthi

Ugk161610 · ‎12-05-2025

Hi @Srisakthi ,

That idle period usually means Spark was waiting (for executors, shuffle, I/O, or a short blocking operation) — not that your code was wrong. Check the Spark UI stage timeline and executor allocation around 12:14 to see which of the above matches your run, then apply the small fixes above.

If you want, paste the exact stage timeline or a screenshot of the Spark UI for 12:13–12:15 and I’ll point to the most likely cause.

– Gopi Krishna

Gpop13 · ‎12-04-2025

Hi @rgsalido - Do you have more notebooks running as part of that pipeline? Is high concurrency ON for notebooks?

Also, when you say the cells taking long, did you check the notebook snapshot after the run to compare, if each cell taking longer or just the startup of spark compute taking long?

pipeline slower than notebook

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

pipeline slower than notebook

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026