Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at the 2025 Microsoft Fabric Community Conference. March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for $400 discount. Register now

Spark startup time with Python packages install is too slow

Creating an environment gives you a nice UI to set a list of Python packages to install by default


The user thinks : “Oh cool, this will be faster than doing %pip install every time I start a notebook”


The reality is that it now takes 3 minutes to start Spark, instead of 10 seconds to run "%pip install"


Either fix the UI to warn users, or fix the startup time.


Status: Planned
Comments
vasu_n
New Member

We had the same problem. The moment we customize the node with PyPi packages, we loose the StarterPools instant start feature and go back to old 3-4 minute start time.


For sometime, used the pip to install the packages from inside the notebook.


And one file day, all ETLs failed stating the %pip magic commands disabled inside pipeline.



Finally i managed to install the pip packages into Lakehouse files space like below


!pip install googleads -t /lakehouse/default/Files/PyPi Packages/


and then in the notebook i add below line to include the Lakehouse folder into syspath so the package can be imported like local installed package.


import sys

sys.path.append('/lakehouse/default/Files/PyPi Packages/')



This is a workaround to avoid high start time and still work with custom packages.


Hope there will be some native fix for this in the future fabric runtimes.




Jonathan_Boarma
New Member

Databricks's newish serverless compute feature seems to directly compete with Fabric's temporary differentiator, which were the fast compute session launches. Now, Fabric is again behind Databricks in terms of offering clean options for maintaining a pool of properly configured compute. At least there when you reserve compute, you get pools that are configured correctly. This is embarrassing, Microsoft!


If you are launching pipelines and need python dependencies, get ready for seriously slow compute. 😕

nishalit
New Member
Thank you for sharing this idea. This feature is planned. Stay tuned.
fbcideas_migusr
New Member
Status changed to: Planned