topic Re: pipeline slower than notebook in Data Engineering

pipeline slower than notebook

rgsalido — Thu, 04 Dec 2025 11:58:26 GMT

Hi everyone,

I have a notebook with several cells that runs very fast, in two minutes or less. When programmed with a pipeline, the duration can reach up to 5 minutes. Do you know what this could be due to and how to solve it?

Re: pipeline slower than notebook

Gpop13 — Thu, 04 Dec 2025 12:03:27 GMT

Hi @rgsalido - Do you have more notebooks running as part of that pipeline? Is high concurrency ON for notebooks?

Also, when you say the cells taking long, did you check the notebook snapshot after the run to compare, if each cell taking longer or just the startup of spark compute taking long?

Re: pipeline slower than notebook

Srisakthi — Thu, 04 Dec 2025 15:56:37 GMT

Hi @Ugk161610 ,

I agree spark session start time is where we are seeing idle in the image at the begining of the job, but I'm little confused on these

at 12:13:00 job started running and again from 12:14:00 to 12:14:40 it shows idle. Any idea why its been idle?

Regards,

Srisakthi

Re: pipeline slower than notebook

Ugk161610 — Fri, 05 Dec 2025 13:38:07 GMT

Hi @Srisakthi ,

That idle period usually means Spark was waiting (for executors, shuffle, I/O, or a short blocking operation) — not that your code was wrong. Check the Spark UI stage timeline and executor allocation around 12:14 to see which of the above matches your run, then apply the small fixes above.

If you want, paste the exact stage timeline or a screenshot of the Spark UI for 12:13–12:15 and I’ll point to the most likely cause.

– Gopi Krishna

Re: pipeline slower than notebook

v-sgandrathi — Mon, 08 Dec 2025 07:21:11 GMT

Hi @rgsalido,

Thank you @Ugk161610 @Srisakthi and @Gpop13 for your replies.

As we have not received a response from you yet, I would like to confirm whether you have successfully resolved the issue or if you require further assistance.

Thank you for your cooperation. Have a great day.

Re: pipeline slower than notebook

BhaveshPatel — Mon, 08 Dec 2025 07:54:32 GMT

Hi @rgsalido

Yes you are right. Running Notebooks is faster compared to pipelines. pipelines is slower because it runs on UI/UX whereas Notebooks is faster because it is programmed by Python/Scala and we have to use Apache Spark and Delta Lake combined.

( https://spark.apache.org/docs/latest/api/python/getting_started/index.html )

( https://docs.delta.io/)

Re: pipeline slower than notebook

rgsalido — Tue, 09 Dec 2025 19:18:53 GMT

I don't have any more notebooks running in that pipeline. The pipeline runs every 5 minutes. Maybe the pipeline+notebook pattern is not the most efficient.

Ideally, the Spark session should be reused for each execution. I have used the tag in the notebook activity, but it doesn't always use the same session.

Re: pipeline slower than notebook

v-sgandrathi — Wed, 10 Dec 2025 09:40:02 GMT

Hi @rgsalido,

Thank you for the update.

The behavior you're experiencing is normal when running a notebook through a pipeline. Pipelines typically start a new Spark session for each run, which adds extra time compared to running the notebook manually. Because your pipeline runs every 5 minutes, session startup is likely causing most of the delay.

Even with a session tag applied, Spark may still start a new session if the previous one isn't active or if the compute resources are busy.

To improve performance, you can try these steps:

Use a consistent session tag in the Notebook activity so Fabric can reuse the Spark session when possible.

Enable high-concurrency or session sharing for pipeline notebooks, if your workspace supports it. This helps the pipeline connect to an existing Spark application instead of starting a new one.

Check your Spark pool capacity. If other jobs are using the pool, session startup may be slower because executors aren't available right away.

Review the Spark UI timeline for idle periods, which often show Spark waiting for resources, shuffle, or I/O, rather than issues in your code.

If your pipeline needs to run frequently, consider keeping a warm session active with the same tag so notebook runs can attach to it faster.

Thank you.

Re: pipeline slower than notebook

v-sgandrathi — Sat, 13 Dec 2025 09:17:38 GMT

Hi @rgsalido,

I wanted to follow up on our previous suggestions regarding the issue. We would love to hear back from you to ensure we can assist you further.

Thank you.

Re: pipeline slower than notebook

v-sgandrathi — Wed, 17 Dec 2025 13:37:03 GMT

HI @rgsalido,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

Thank you.