Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Usually, spark.stop() is recommended as a best practice to releasing resources like memory, CPU, and network connections back to the cluster and for other reasons. I usually do that for all my notebooks.
I am calling a sequential execution of a notebook inside forEach. If I don't use spark.stop(), does it keep on taking advantage of the same session and if I do spark.stop() is it an overkill as it shut down and restarts session forEach execution?
//pesudocode
[{id: 1}, {id: 2}, {id: 3}].forEach((_, i) => execute(NB1))
If I don't use spark.stop() which I am currently doing, is there any way to shut the session at the completion f the loop
Solved! Go to Solution.
Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook.
Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion:
https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.
I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.
Also, I think mssparkutils will be replaced by notebookutils going forward:
NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn
Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn
I don't have enough knowledge about how sessions work in Fabric to answer this properly. Interesting question, though! I will try to learn more about this.
Just to be clear, I understand your current setup like this:
I think I need to learn more about topics like concurrency, and whether it is necessary to use spark.stop() in Fabric or does Fabric manage the stop of a session when a Notebook run is finished.
Perhaps this blog post is relevant: https://www.fourmoo.com/2024/01/10/microsoft-fabric-notebook-session-usage-explained-and-how-to-save...
I am guessing you don't need to use spark.stop() in Fabric.
Are you starting the Spark session also by using code? Something like this:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("Simple DataFrame Example") \
.getOrCreate()
I don't think that is necessary in Fabric also. I guess sessions are managed by Fabric. When running a notebook interactively (using the notebook editor interface) I guess it's a good idea to click 'Stop session' when finished. However when running a Notebook in a Data pipeline, I think Fabric manages the session and stops the session when it's not needed anymore. Ref. the blog post link in my previous comment.
I am also guessing that using spark.stop() inside the Notebook can make you unable to take advantage of high concurrency spark sessions.
However I'm not sure about any of this, as I don't have enough knowledge or experience with this.
Hoping to get others' insights and thoughts on this 😃
I started a discussion on Reddit to try to learn more about the topic:
spark.stop() - is it needed? : r/MicrosoftFabric (reddit.com)
I also noticed there is an alterantive to spark.stop(), which is mssparkutils.session.stop()
Anyway, I'm not sure if it's necessary.
I am still not entirely sure what to believe regarding session start/stop in Fabric.
There is also the option to use a Master notebook and use that notebook to call other notebooks. Then I think you can share the same session among notebooks. I think this approach utilizes the high concurrency feature.
EDIT: I think the Reddit discussion has made me understand more about it. I recommend checking out the Reddit discussion (link above).
@frithjof_v thanks for this.
High concurrency is not shipped yet(off-topic)
Therefore, if measures are not taken for a large_array, the pipeline will error out if you are calling a notebook inside forEach for a large array to perform operation on the same table. E.g.
//pseudo code
const large_array = [1,2,...20]
//updates to be utilize in upsert
const updates = updates
//target
const target = delta_fact
//forEach activity in pipeline sequntial execution on a subset of target
forEach eleemnt of large_array {
perform Delta Table Merge sequentially
where each large_array[element] = target[element]
}
Yes, it sounds like the best option is to not use ForEach in this case, instead have a master notebook and execute all other notebook runs from the master notebook.
Look for mssparkutils.notebook.run(), mssparkutils.notebook.runMultiple() or Threadpooling in the Reddit discussion:
https://www.reddit.com/r/MicrosoftFabric/comments/1eolfda/sparkstop_is_it_needed/.
I noticed mssparkutils.notebook.runMultiple() is a preview feature. I haven't checked the status of the other mentioned features.
Also, I think mssparkutils will be replaced by notebookutils going forward:
NotebookUtils (former MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn
Microsoft Spark Utilities (MSSparkUtils) for Fabric - Microsoft Fabric | Microsoft Learn
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.