Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
smpa01
Super User
Super User

Stopping Spark Session inside/outside ForEach

Usually, spark.stop() is recommended as a best practice to releasing resources like memory, CPU, and network connections back to the cluster and for other reasons. I usually do that for all my notebooks.

 

I am calling a sequential execution of a notebook inside forEach.  If I don't use spark.stop(), does it keep on taking advantage of the same session and if I do spark.stop() is it an overkill as it shut down and restarts session forEach execution?

 

 

 

 

//pesudocode
[{id: 1}, {id: 2}, {id: 3}].forEach((_, i) => execute(NB1))

 

 

 

 

If I don't use spark.stop() which I am currently doing, is there any way to shut the session at the completion f the loop

@frithjof_v 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
1 ACCEPTED SOLUTION

Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook. 

 

Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion: 

https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.

 

I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.

 

Also, I think mssparkutils will be replaced by notebookutils going forward:

NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

View solution in original post

5 REPLIES 5
frithjof_v
Super User
Super User

I don't have enough knowledge about how sessions work in Fabric to answer this properly. Interesting question, though! I will try to learn more about this.

 

Just to be clear, I understand your current setup like this:

  • You are using a Data Factory Data Pipeline.
  • Inside the Data Pipeline, you have a ForEach activity with the Sequential option selected.
  • Inside the ForEach Activity, you are executing a Notebook.

 

I think I need to learn more about topics like concurrency, and whether it is necessary to use spark.stop() in Fabric or does Fabric manage the stop of a session when a Notebook run is finished.

 

Perhaps this blog post is relevant: https://www.fourmoo.com/2024/01/10/microsoft-fabric-notebook-session-usage-explained-and-how-to-save...

I am guessing you don't need to use spark.stop() in Fabric.

 

Are you starting the Spark session also by using code? Something like this:

 

from pyspark.sql import SparkSession

spark = SparkSession.builder \

    .appName("Simple DataFrame Example") \

    .getOrCreate()

 

I don't think that is necessary in Fabric also. I guess sessions are managed by Fabric. When running a notebook interactively (using the notebook editor interface) I guess it's a good idea to click 'Stop session' when finished. However when running a Notebook in a Data pipeline, I think Fabric manages the session and stops the session when it's not needed anymore. Ref. the blog post link in my previous comment.

 

I am also guessing that using spark.stop() inside the Notebook can make you unable to take advantage of high concurrency spark sessions.

 

However I'm not sure about any of this, as I don't have enough knowledge or experience with this.

 

Hoping to get others' insights and thoughts on this 😃

 

I started a discussion on Reddit to try to learn more about the topic: 

 

spark.stop() - is it needed? : r/MicrosoftFabric (reddit.com)

 

I also noticed there is an alterantive to spark.stop(), which is mssparkutils.session.stop()

 

Anyway, I'm not sure if it's necessary.

 

I am still not entirely sure what to believe regarding session start/stop in Fabric.

 

There is also the option to use a Master notebook and use that notebook to call other notebooks. Then I think you can share the same session among notebooks. I think this approach utilizes the high concurrency feature.

 

EDIT: I think the Reddit discussion has made me understand more about it. I recommend checking out the Reddit discussion (link above).

@frithjof_v  thanks for this.

High concurrency is not shipped yet(off-topic)

 

Therefore, if measures are not taken for a large_array,  the pipeline will error out if you are calling a notebook inside forEach for a large array to perform operation on the same table. E.g.

 

//pseudo code 
const large_array = [1,2,...20]

//updates to be utilize in upsert
const updates = updates

//target
const target = delta_fact

//forEach activity in pipeline sequntial execution on a subset of target
forEach eleemnt of large_array {
perform Delta Table Merge sequentially
      where each large_array[element] = target[element]

}

 



 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook. 

 

Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion: 

https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.

 

I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.

 

Also, I think mssparkutils will be replaced by notebookutils going forward:

NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.