Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
smpa01
Super User
Super User

Stopping Spark Session inside/outside ForEach

Usually, spark.stop() is recommended as a best practice to releasing resources like memory, CPU, and network connections back to the cluster and for other reasons. I usually do that for all my notebooks.

 

I am calling a sequential execution of a notebook inside forEach.  If I don't use spark.stop(), does it keep on taking advantage of the same session and if I do spark.stop() is it an overkill as it shut down and restarts session forEach execution?

 

 

 

 

//pesudocode
[{id: 1}, {id: 2}, {id: 3}].forEach((_, i) => execute(NB1))

 

 

 

 

If I don't use spark.stop() which I am currently doing, is there any way to shut the session at the completion f the loop

@frithjof_v 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
1 ACCEPTED SOLUTION

Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook. 

 

Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion: 

https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.

 

I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.

 

Also, I think mssparkutils will be replaced by notebookutils going forward:

NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

View solution in original post

5 REPLIES 5
frithjof_v
Super User
Super User

I don't have enough knowledge about how sessions work in Fabric to answer this properly. Interesting question, though! I will try to learn more about this.

 

Just to be clear, I understand your current setup like this:

  • You are using a Data Factory Data Pipeline.
  • Inside the Data Pipeline, you have a ForEach activity with the Sequential option selected.
  • Inside the ForEach Activity, you are executing a Notebook.

 

I think I need to learn more about topics like concurrency, and whether it is necessary to use spark.stop() in Fabric or does Fabric manage the stop of a session when a Notebook run is finished.

 

Perhaps this blog post is relevant: https://www.fourmoo.com/2024/01/10/microsoft-fabric-notebook-session-usage-explained-and-how-to-save...

I am guessing you don't need to use spark.stop() in Fabric.

 

Are you starting the Spark session also by using code? Something like this:

 

from pyspark.sql import SparkSession

spark = SparkSession.builder \

    .appName("Simple DataFrame Example") \

    .getOrCreate()

 

I don't think that is necessary in Fabric also. I guess sessions are managed by Fabric. When running a notebook interactively (using the notebook editor interface) I guess it's a good idea to click 'Stop session' when finished. However when running a Notebook in a Data pipeline, I think Fabric manages the session and stops the session when it's not needed anymore. Ref. the blog post link in my previous comment.

 

I am also guessing that using spark.stop() inside the Notebook can make you unable to take advantage of high concurrency spark sessions.

 

However I'm not sure about any of this, as I don't have enough knowledge or experience with this.

 

Hoping to get others' insights and thoughts on this 😃

 

I started a discussion on Reddit to try to learn more about the topic: 

 

spark.stop() - is it needed? : r/MicrosoftFabric (reddit.com)

 

I also noticed there is an alterantive to spark.stop(), which is mssparkutils.session.stop()

 

Anyway, I'm not sure if it's necessary.

 

I am still not entirely sure what to believe regarding session start/stop in Fabric.

 

There is also the option to use a Master notebook and use that notebook to call other notebooks. Then I think you can share the same session among notebooks. I think this approach utilizes the high concurrency feature.

 

EDIT: I think the Reddit discussion has made me understand more about it. I recommend checking out the Reddit discussion (link above).

@frithjof_v  thanks for this.

High concurrency is not shipped yet(off-topic)

 

Therefore, if measures are not taken for a large_array,  the pipeline will error out if you are calling a notebook inside forEach for a large array to perform operation on the same table. E.g.

 

//pseudo code 
const large_array = [1,2,...20]

//updates to be utilize in upsert
const updates = updates

//target
const target = delta_fact

//forEach activity in pipeline sequntial execution on a subset of target
forEach eleemnt of large_array {
perform Delta Table Merge sequentially
      where each large_array[element] = target[element]

}

 



 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook. 

 

Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion: 

https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.

 

I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.

 

Also, I think mssparkutils will be replaced by notebookutils going forward:

NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.