<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: pipeline slower than notebook in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893320#M13875</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/920086"&gt;@Ugk161610&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree spark session start time is where we are seeing idle in the image at the begining of the job, but I'm little confused on these&lt;/P&gt;
&lt;P&gt;at 12:13:00 job started running and again from 12:14:00 to 12:14:40 it shows idle. Any idea why its been idle?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Srisakthi&lt;/P&gt;</description>
    <pubDate>Thu, 04 Dec 2025 15:56:37 GMT</pubDate>
    <dc:creator>Srisakthi</dc:creator>
    <dc:date>2025-12-04T15:56:37Z</dc:date>
    <item>
      <title>pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893053#M13863</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I have a notebook with several cells that runs very fast, in two minutes or less. When programmed with a pipeline, the duration can reach up to 5 minutes. Do you know what this could be due to and how to solve it?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="rgsalido_0-1764849301934.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1314681i71802E4DE6646D22/image-size/medium?v=v2&amp;amp;px=400" role="button" title="rgsalido_0-1764849301934.png" alt="rgsalido_0-1764849301934.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 11:58:26 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893053#M13863</guid>
      <dc:creator>rgsalido</dc:creator>
      <dc:date>2025-12-04T11:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893062#M13865</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;&amp;nbsp; - Do you have more notebooks running as part of that pipeline? Is high concurrency ON for notebooks?&lt;/P&gt;&lt;P&gt;Also, when you say the cells taking long, did you check the notebook snapshot after the run to compare, if each cell taking longer or just the startup of spark compute taking long?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 12:03:27 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893062#M13865</guid>
      <dc:creator>Gpop13</dc:creator>
      <dc:date>2025-12-04T12:03:27Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893320#M13875</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/920086"&gt;@Ugk161610&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree spark session start time is where we are seeing idle in the image at the begining of the job, but I'm little confused on these&lt;/P&gt;
&lt;P&gt;at 12:13:00 job started running and again from 12:14:00 to 12:14:40 it shows idle. Any idea why its been idle?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Srisakthi&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 15:56:37 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4893320#M13875</guid>
      <dc:creator>Srisakthi</dc:creator>
      <dc:date>2025-12-04T15:56:37Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4894293#M13894</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/782160"&gt;@Srisakthi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;That idle period usually means Spark was &lt;EM&gt;waiting&lt;/EM&gt; (for executors, shuffle, I/O, or a short blocking operation) — not that your code was wrong. Check the Spark UI stage timeline and executor allocation around 12:14 to see which of the above matches your run, then apply the small fixes above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you want, paste the exact stage timeline or a screenshot of the Spark UI for 12:13–12:15 and I’ll point to the most likely cause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;– Gopi Krishna&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Dec 2025 13:38:07 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4894293#M13894</guid>
      <dc:creator>Ugk161610</dc:creator>
      <dc:date>2025-12-05T13:38:07Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4895617#M13916</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/920086"&gt;@Ugk161610&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/782160"&gt;@Srisakthi&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/628988"&gt;@Gpop13&lt;/a&gt;&amp;nbsp;for your replies.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt;As we have not received a response from you yet, I would like to confirm whether you have successfully resolved the issue or if you require further assistance.&lt;BR /&gt;&lt;BR /&gt;Thank you for your cooperation.&amp;nbsp;Have a great day.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Dec 2025 07:21:11 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4895617#M13916</guid>
      <dc:creator>v-sgandrathi</dc:creator>
      <dc:date>2025-12-08T07:21:11Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4895643#M13917</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes you are right. Running Notebooks is faster compared to pipelines. pipelines is slower because it runs on UI/UX whereas Notebooks is faster because it is programmed by Python/Scala and we have to use Apache Spark and Delta Lake combined.&amp;nbsp;&lt;/P&gt;&lt;P&gt;(&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/getting_started/index.html" target="_blank"&gt;https://spark.apache.org/docs/latest/api/python/getting_started/index.html )&lt;/A&gt;&lt;/P&gt;&lt;P&gt;(&amp;nbsp;&lt;A href="https://docs.delta.io/" target="_blank"&gt;https://docs.delta.io/&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Dec 2025 07:54:32 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4895643#M13917</guid>
      <dc:creator>BhaveshPatel</dc:creator>
      <dc:date>2025-12-08T07:54:32Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4897755#M13965</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I don't have any more notebooks running in that pipeline. The pipeline runs every 5 minutes. Maybe the pipeline+notebook pattern is not the most efficient.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Ideally, the Spark session should be reused for each execution. I have used the tag in the notebook activity, but it doesn't always use the same session.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Dec 2025 19:18:53 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4897755#M13965</guid>
      <dc:creator>rgsalido</dc:creator>
      <dc:date>2025-12-09T19:18:53Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4898279#M13974</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for the update.&lt;/P&gt;
&lt;P&gt;The behavior you're experiencing is normal when running a notebook through a pipeline. Pipelines typically start a new Spark session for each run, which adds extra time compared to running the notebook manually. Because your pipeline runs every 5 minutes, session startup is likely causing most of the delay.&lt;/P&gt;
&lt;P&gt;Even with a session tag applied, Spark may still start a new session if the previous one isn't active or if the compute resources are busy.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;To improve performance, you can try these steps:&lt;/P&gt;
&lt;P&gt;Use a consistent session tag in the Notebook activity so Fabric can reuse the Spark session when possible.&lt;/P&gt;
&lt;P&gt;Enable high-concurrency or session sharing for pipeline notebooks, if your workspace supports it. This helps the pipeline connect to an existing Spark application instead of starting a new one.&lt;/P&gt;
&lt;P&gt;Check your Spark pool capacity. If other jobs are using the pool, session startup may be slower because executors aren't available right away.&lt;/P&gt;
&lt;P&gt;Review the Spark UI timeline for idle periods, which often show Spark waiting for resources, shuffle, or I/O, rather than issues in your code.&lt;/P&gt;
&lt;P&gt;If your pipeline needs to run frequently, consider keeping a warm session active with the same tag so notebook runs can attach to it faster.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Dec 2025 09:40:02 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4898279#M13974</guid>
      <dc:creator>v-sgandrathi</dc:creator>
      <dc:date>2025-12-10T09:40:02Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4901061#M14054</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt;I wanted to follow up on our previous suggestions regarding the issue. We would love to hear back from you to ensure we can assist you further.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Dec 2025 09:17:38 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4901061#M14054</guid>
      <dc:creator>v-sgandrathi</dc:creator>
      <dc:date>2025-12-13T09:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: pipeline slower than notebook</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4903960#M14106</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/147604"&gt;@rgsalido&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We haven’t heard from you on the last response and was just checking back to see if your query was answered.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Otherwise, will respond back with the more details and we will try to help .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Dec 2025 13:37:03 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/pipeline-slower-than-notebook/m-p/4903960#M14106</guid>
      <dc:creator>v-sgandrathi</dc:creator>
      <dc:date>2025-12-17T13:37:03Z</dc:date>
    </item>
  </channel>
</rss>

