Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi everyone,
I am building a metadata driven orchestration system in Fabric where jobs are triggered across Bronze, Silver, and Gold layers. One area I am exploring is how to start downstream notebooks programmatically in a way that uses Spark resources efficiently.
From what I can tell there are two main options:
Use the Fabric REST API to start through the Semantic Link library inside a notebook
Call notebookutils.run from within another notebook
My concern is Spark concurrency and resource usage. Ideally I would like jobs to start in a high concurrency session so they do not spin up a brand new Spark context every time, since that adds overhead.
Has anyone found a best practice for programmatically starting notebooks in Fabric so that Spark resources are reused efficiently? For example, is there a recommended way to target a high concurrency session, or clear tradeoffs between using the REST API and notebookutils?
Looking forward to hearing how others are tackling this.
Thanks,
Taylor
Solved! Go to Solution.
This is my understanding as well.
My goal is to optimize spark resources, and if I'm programatically starting a bunch of notebooks each in their own starter pool, that is not very optimal.
I was hoping someone here would have some magic sauce I could explore.
I think my approach going forward will be to run the notebooks from a pipeline, and then run the pipelines from my orchestration layer, that way in the pipeline I can set up the high ocncurrency settings for Spark.
It seems like the best way to do this is to simply have my main orchestration notebook run a pipeline, and then in the pipeline configure it to run the desired notebook in a high concurrency session.
Not ideal, but functional I suppose.
It seems like the best way to do this is to simply have my main orchestration notebook run a pipeline, and then in the pipeline configure it to run the desired notebook in a high concurrency session.
Not ideal, but functional I suppose.
@tayloramy 1. Fabric REST API via Semantic Link
2. notebookutils.run() from another notebook
Spark Efficiency Tips
For orchestration across Bronze, Silver, and Gold layers, the REST API approach is generally preferred. Let me know if you want help setting up a sample flow.
Hi @tayloramy
It should be run most effective way. ( ETL )
First Python --> Apache Spark ( Data Lake ) -- > Delta Lake Layer. -- > Two approaches ( First one is Bronze, Silver and Gold Approach ( ETL ) and the other approach is ( Bronze Layer only and use show views in the subsequently).
You should use Connect in Notebooks and use New standard session cluster.
There are some limitation criteria about High Concurrancy Cluster. For example, You can not use scala. You have to use python or SparkSQL only.
Hi @tayloramy ,
First of all, and independent of the Rest API, configuring the high concurrency in the workspace would be a first step towards using resources more efficiently.
Here is the Microsoft documentation if not already known.
Best regards
Hi @spaceman127,
Thanks for the response.
yes, I'm familiar with the high concurrency spark sessions, but as far as I know a ntoebook can join a high concurrency session either manually when running it in the interface, or when started in a pipeline, which is outlined here: https://learn.microsoft.com/en-us/fabric/data-engineering/configure-high-concurrency-session-noteboo...
I'm wondering if there's any way to run a notebook in a high concurrency session from inside another notebook, either using semantic link or notebookutils, or though another method.
I'm also curious about what high concurrency looks like across different workspaces - the docs say that the notebooks should be in the same workspace, but the word should makes me think it might be possible to do this cross workspace, though it would likely be very unsupported.
Thanks,
Taylor
Hi @tayloramy ,
all right.
To answer your question:
Yes, you can do that with notebookutils.run.
I just tested it again.
Here's an example:
# Here is an Example
result = notebookutils.notebook.run(
"nb2", # Name from Notebooks
1800, # Timeout in seconds
{"param1": "test"}, # Parameter when you use it
"12345678-1234-1234-1234-c55a97b0f92a" # WorkspaceId
)
print(result)
And here is the documentation for it.
https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-utilities
As far as the Rest API is concerned, I could imagine that it would work.
However, I haven't checked that yet.
I hope it helps you.
Best regards
Hi @spaceman127,
I understand how to run a notebook with both notebookutils and sempy, my question is: Is there a way to do this while also running the notebook in a high concurrency spark pool instead of the default pool?
Hi @tayloramy ,
ok i understand this.
I don't think that will work. You can, of course, have multiple Spark pools in a workspace, but you can't do that with notebookutils.notebook.run().
Fabric works with sessions.
The default pool is always used. I'm not aware of any other way of doing this at the moment.
Best regards
This is my understanding as well.
My goal is to optimize spark resources, and if I'm programatically starting a bunch of notebooks each in their own starter pool, that is not very optimal.
I was hoping someone here would have some magic sauce I could explore.
I think my approach going forward will be to run the notebooks from a pipeline, and then run the pipelines from my orchestration layer, that way in the pipeline I can set up the high ocncurrency settings for Spark.
I always try to optimize my notebooks as much as possible.
In this case, however, there is little chance of success.
I will continue testing in this area as soon as I have time. If I find anything, I will get back to you.