Solved: Re: Notebook in Pipeline Takes Forever to Run

WishAskedSooner · ‎08-22-2024

Hi Experts!

Very new to Fabric and still learning! I implemented my first Notebook using PySpark. It is quite simple. I have a string parameter that I am passing to the notebook. Then simply trying to save that to a JSON file in my Lakehouse. I can run the code within the editor successfully in less than 1 second. However, when I try to run the my Notebook in a Pipeline, it gets stuck in Starting State for 10+ minutes before it starts/finishes.

Here is a snapshot of my Notebook (it is based on a solution provided in one of my other threads):

So, as you can see, I have set up FromPipeline as a Parameter in my Notebook. And I am able to run the code and write out the FileList.json file to the Files root directory in my default lakehouse in less than a second. So, from what I can see all is good in my Notebook.

Now, when I bring the Notebook into a Pipeline, I pass a string variable in FromPipeline:

However, it takes more than 10 minutes for the Notebook to execute. If I click on while in progress it says something about in a Starting State.

What is going on here? Is there a setting somewhere in Fabric that I need to configure for Notebooks to run properly in pipelines? This can't be standard behavior.

Thanks for the help!

cjbowman69 · ‎08-28-2024

Hi,
Something that's worth checking is if you're using the basic starter pools on any sku up to an F16 (I believe) you will only be able to have 1 maybe 2 spark sessions open.
As such, if you have a notebook session open while trying to run notebooks through pipelines, the pipelines just wait for a session to become available. I've seen this wait up to 20 minutes before just because I forgot to close a notebook I was tinkering with.
Hope this helps, let me know if you need anymore information.
This blog explains it really well: https://www.advancinganalytics.co.uk/blog/2023/12/13/fabric-notebook-concurrency-explained

View solution in original post

jwinchell40 · ‎08-25-2024

@WishAskedSooner

The command will always run pretty fast in the Notebook because the Spark Cluster is already up and running when that cell runs. If you look at the time it takes the very first time you Run on a Notebook (without starting a session first) you will see the spark cluster spin-up time.

When a Notebook is run from a Pipeline it has to provision and start a Spark Session and Spark Clusters based on the Workspace & Environment configuration. This can take time depending on whether or not you've changed settings. If you leave everything as the default; Microsoft has clusters ready to go so things can go quick. If you've made any changes or setup your own Environment Configuration, it will take longer because it is spinning up a cluster specific to your configuration. Our clusters take 3-5m to start-up sometimes longer based on capacity limitations etc. There are cases where our startup has taken 10m as well but that has been the exception.

frithjof_v · ‎08-25-2024

Good point. I have been thinking about the same myself.

If @WishAskedSooner hasn't done any changes to pools or environments, the starter pools should spin up in a matter of few seconds, also when the notebook is run inside a data pipeline, right?

When I don't make any changes to pools or environments, just use the starter pools, then a notebook can start up in a few seconds in a data pipeline (at least less than 30 seconds, perhaps just a couple of seconds). I'm on a trial capacity (~F64).

However applying an environment, or using another pool than the starter pool, leads to increased start-up time

Could this be the reason, @WishAskedSooner? Have you made any configuration regarding pools/environment? If you want minimal start-up time, these settings should be left untouched.

jwinchell40 · ‎08-25-2024

@frithjof_v - Completely agree. Another thing that comes into play would be the libraries being imported. There reason why we created our own environments is because we were loading the Semantic Link library and it would take forever and it had to run for each Notebook. The environment pre-loads it so we only take hit on initial start-up.

@WishAskedSooner - Have you taken a look at the Snapshot logs from the data pipeline? You can get there from the viewing the run history; just click on the Notebook step.

Then check the Run Detail stats:

Also take a look at the Resources Preview Dashboard; can show you how long things are waiting vs running.

I'm going to say I understand everything the logs show; but maybe it will help point to something as well. I also think it is worth opening a ticket up with Microsoft for direct support.

cjbowman69 · ‎08-28-2024

Hi,
Something that's worth checking is if you're using the basic starter pools on any sku up to an F16 (I believe) you will only be able to have 1 maybe 2 spark sessions open.
As such, if you have a notebook session open while trying to run notebooks through pipelines, the pipelines just wait for a session to become available. I've seen this wait up to 20 minutes before just because I forgot to close a notebook I was tinkering with.
Hope this helps, let me know if you need anymore information.
This blog explains it really well: https://www.advancinganalytics.co.uk/blog/2023/12/13/fabric-notebook-concurrency-explained

frithjof_v · ‎08-22-2024

In my case it only takes 20 seconds to run the Notebook. The entire pipeline runs in less than 1,5 minutes. I am on the trial capacity which is similar to F64. Not sure if the capacity size matters.

WishAskedSooner · ‎08-23-2024

I am on a Premium account. Made some changes to my Notebook this morning and same thing........taking forever to start up, going on 15 minutes as I type this reply. I really need someone from Microsoft to respond because I have no idea how to even troubleshoot this problem. And 10+ minutes for a Notebook to start in a Pipeline is a deal killer for development. What should I do Microsoft? Open a ticket? Please I need some help.

Notebook in Pipeline Takes Forever to Run

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

New Offer! Become a Certified Fabric Data Engineer

Notebook in Pipeline Takes Forever to Run

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025