Spark startup time with Python packages install is...

mlongtin1 · ‎06-05-2024

Creating an environment gives you a nice UI to set a list of Python packages to install by default

The user thinks : “Oh cool, this will be faster than doing %pip install every time I start a notebook”

The reality is that it now takes 3 minutes to start Spark, instead of 10 seconds to run "%pip install"

Either fix the UI to warn users, or fix the startup time.

vasu_n · ‎07-01-2024

We had the same problem. The moment we customize the node with PyPi packages, we loose the StarterPools instant start feature and go back to old 3-4 minute start time.

For sometime, used the pip to install the packages from inside the notebook.

And one file day, all ETLs failed stating the %pip magic commands disabled inside pipeline.

Finally i managed to install the pip packages into Lakehouse files space like below

!pip install googleads -t /lakehouse/default/Files/PyPi Packages/

and then in the notebook i add below line to include the Lakehouse folder into syspath so the package can be imported like local installed package.

import sys

sys.path.append('/lakehouse/default/Files/PyPi Packages/')

This is a workaround to avoid high start time and still work with custom packages.

Hope there will be some native fix for this in the future fabric runtimes.

Jonathan_Boarma · ‎10-27-2024

Databricks's newish serverless compute feature seems to directly compete with Fabric's temporary differentiator, which were the fast compute session launches. Now, Fabric is again behind Databricks in terms of offering clean options for maintaining a pool of properly configured compute. At least there when you reserve compute, you get pools that are configured correctly. This is embarrassing, Microsoft!

If you are launching pipelines and need python dependencies, get ready for seriously slow compute. 😕

nishalit · ‎02-21-2025

Thank you for sharing this idea. This feature is planned. Stay tuned.

fbcideas_migusr · ‎02-25-2025

Andrew_Hill_TRS · ‎09-15-2025

forget 3 minutes - if you have custom network (because for some reason your production is not available to the entire internet), and you have a library you can expect 10 to 15 minutes start-up, which support says is "expected" it is currently documented at 'https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute' that you can expect 2 to 5 minutes extra for adding a library, but i find that i pay the full 5 minutes even for a little helper library that i can pip install in less than 30 seconds.

nielsvdc · ‎09-23-2025

Currently in our environment startup time for a custom environment on a starter pool can take up to 18 minutes. This is since the September release after FabCon Europe. 18 minutes is a lot of coffee drinking and not really being productive. I'm getting questions about when we are moving to Synapse or Databricks, which is not what we really want from an architecture point of view.

Update: There is a known issue since August https://support.fabric.microsoft.com/known-issues/?active=true&fixed=true&sort=published&product=Dat...

Andrew_Hill_TRS · ‎09-23-2025

Anything more that 15 minutes is out of spec, and that would by for a custom size, vnet linked environment with a custom library added, each documented as *max* 5 minutes, but minimum 30b seconds to 2 minutes.

Spark startup time with Python packages install is too slow

Support Lakehouse as Destination for Incremental C...

Drill Down Should be( Matrix Preview +/- Collapse/...

Ability to Collapse Individual Rows in Matrix Visu...

moving fabric items between workspaces are not ch...

Favorite experience/product type view

FabCon is coming to Atlanta