Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
gmangiante
Frequent Visitor

Maintaining notebook Spark session within pipeline

Hello - I'm working on building a Fabric Pipeline that uses multiple PySpark notebooks within its flow. I'm noticing that, although the notebooks run pretty quickly on their own, they take at least a minute longer when run within the pipeline. My assumption is that this is due to having to start up a new Spark session for each notebook invocation. Could someone confirm this for me? If that is the case, is there any way to maintain the session for the entire pipeline to avoid these extended start-up times? (Note: I don't believe the %run magic will work here, because I need to parameterize each notebook dynamically along the way, but please correct me if I'm wrong.) Thanks!

2 ACCEPTED SOLUTIONS
v-nikhilan-msft
Community Support
Community Support

Hi @gmangiante ,

You are right. Each notebook step would start a new Spark session.

We do have a plan to enable session sharing across pipeline steps with high concurrency for pipelines which would allow you to reuse sessions and avoid additional delays.

The ETA for the deployment is planned for this semester and is currently in design phase . Stay tuned for more updates.

Appreciate your patience.

 

Hope this helps. Please let us know if you have any further questions. Glad to help.

View solution in original post

If anyone finds this thread, it is scheduled for Q2 2024 (https://learn.microsoft.com/en-us/fabric/release-plan/data-engineering#concurrency)

High concurrency in pipelines

Estimated release timeline: Q2 2024

In addition to high concurrency in notebooks, we will also enable high concurrency in pipelines. This capability will allow you to run multiple notebooks in a pipeline with a single session.

View solution in original post

8 REPLIES 8
v-nikhilan-msft
Community Support
Community Support

Hi @gmangiante ,

You are right. Each notebook step would start a new Spark session.

We do have a plan to enable session sharing across pipeline steps with high concurrency for pipelines which would allow you to reuse sessions and avoid additional delays.

The ETA for the deployment is planned for this semester and is currently in design phase . Stay tuned for more updates.

Appreciate your patience.

 

Hope this helps. Please let us know if you have any further questions. Glad to help.

If anyone finds this thread, it is scheduled for Q2 2024 (https://learn.microsoft.com/en-us/fabric/release-plan/data-engineering#concurrency)

High concurrency in pipelines

Estimated release timeline: Q2 2024

In addition to high concurrency in notebooks, we will also enable high concurrency in pipelines. This capability will allow you to run multiple notebooks in a pipeline with a single session.

Hi,

This feature would be very helpful. Is there an update on the ETA? Or is it still expected in December?

 

Thanks - this totally makes sense, and I was guessing that it was on the roadmap, looking at the current high-concurrency capability for interactive notebooks - that would naturally extend to pipelines, and I'm sure I'm not the only person who's come up with this issue. I look forward to future developments, and I appreciate the quick response!

We're also experiencing quite a bit of performance issues with pipelines and hoping that high concurrency with help in our case as well.

Not sure if this is helpful for you, but for now, we've decided to go with pure Spark job definitions rather than leveraging Pipelines. It's not as modular and transparent as a Pipeline would be, but it gets the job done efficiently and operates in batch mode instead of interactive (https://learn.microsoft.com/en-us/fabric/data-engineering/spark-job-concurrency-and-queueing), which lets us schedule refreshes more easily, since they're queued. Would still like to get back to a notebook-powered Pipeline at some point, but having managed Spark available alongside our lakehouse is extremely useful.

To help prioritize this workitem, please submit a new idea on Fabric Pipelines, if it doesnt already exist. 

v-nikhilan-msft
Community Support
Community Support

Hi @gmangiante ,

Thanks for using Fabric Community and reporting this.

Apologies for the issue you have been facing. I would like to check are you still facing this issue? 

It's difficult to tell what could be the reason for this performance.

 

I have reached to the internal team for help on this. I will update you once I hear back from them.
Appreciate your patience.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors