Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Reply
tayloramy
Solution Sage
Solution Sage

Best way to start Fabric notebooks via API or Semantic Link for efficient Spark usage

Hi everyone,

I am building a metadata driven orchestration system in Fabric where jobs are triggered across Bronze, Silver, and Gold layers. One area I am exploring is how to start downstream notebooks programmatically in a way that uses Spark resources efficiently.

From what I can tell there are two main options:

  1. Use the Fabric REST API to start through the Semantic Link library inside a notebook

  2. Call notebookutils.run from within another notebook

My concern is Spark concurrency and resource usage. Ideally I would like jobs to start in a high concurrency session so they do not spin up a brand new Spark context every time, since that adds overhead.

Has anyone found a best practice for programmatically starting notebooks in Fabric so that Spark resources are reused efficiently? For example, is there a recommended way to target a high concurrency session, or clear tradeoffs between using the REST API and notebookutils?

Looking forward to hearing how others are tackling this.

 

Thanks,
Taylor

2 ACCEPTED SOLUTIONS

This is my understanding as well. 

My goal is to optimize spark resources, and if I'm programatically starting a bunch of notebooks each in their own starter pool, that is not very optimal. 

 

I was hoping someone here would have some magic sauce I could explore. 
I think my approach going forward will be to run the notebooks from a pipeline, and then run the pipelines from my orchestration layer, that way in the pipeline I can set up the high ocncurrency settings for Spark. 

 

View solution in original post

tayloramy
Solution Sage
Solution Sage

It seems like the best way to do this is to simply have my main orchestration notebook run a pipeline, and then in the pipeline configure it to run the desired notebook in a high concurrency session.
Not ideal, but functional I suppose. 

View solution in original post

10 REPLIES 10
tayloramy
Solution Sage
Solution Sage

It seems like the best way to do this is to simply have my main orchestration notebook run a pipeline, and then in the pipeline configure it to run the desired notebook in a high concurrency session.
Not ideal, but functional I suppose. 

anilgavhane
Resolver II
Resolver II

@tayloramy 1. Fabric REST API via Semantic Link

  • More scalable
  • Better control over session reuse
  • Ideal for metadata-driven orchestration

2. notebookutils.run() from another notebook

  • Simple chaining
  • May spin up new Spark context (less efficient)

 

Spark Efficiency Tips

  • Use high concurrency sessions
  • Keep notebooks modular
  • Avoid long-running transformations downstream

For orchestration across Bronze, Silver, and Gold layers, the REST API approach is generally preferred. Let me know if you want help setting up a sample flow.

BhaveshPatel
Community Champion
Community Champion

Hi @tayloramy 

 

It should be run most effective way. ( ETL ) 
First Python --> Apache Spark ( Data Lake ) -- > Delta Lake Layer. -- > Two approaches ( First one is Bronze, Silver and Gold Approach ( ETL ) and the other approach is ( Bronze Layer only and use show views in the subsequently).

You should use Connect in Notebooks and use New standard session cluster. 

There are some limitation criteria about High Concurrancy Cluster. For example, You can not use scala. You have to use python or SparkSQL only.

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.
spaceman127
Resolver I
Resolver I

Hi @tayloramy ,

 

First of all, and independent of the Rest API, configuring the high concurrency in the workspace would be a first step towards using resources more efficiently.


Here is the Microsoft documentation if not already known.

 

https://learn.microsoft.com/en-us/fabric/data-engineering/configure-high-concurrency-session-noteboo...

Best regards 

Hi @spaceman127

Thanks for the response. 

yes, I'm familiar with the high concurrency spark sessions, but as far as I know a ntoebook can join a high concurrency session either manually when running it in the interface, or when started in a pipeline, which is outlined here: https://learn.microsoft.com/en-us/fabric/data-engineering/configure-high-concurrency-session-noteboo...

 

I'm wondering if there's any way to run a notebook in a high concurrency session from inside another notebook, either using semantic link or notebookutils, or though another method. 

 

I'm also curious about what high concurrency looks like across different workspaces - the docs say that the notebooks should be in the same workspace, but the word should makes me think it might be possible to do this cross workspace, though it would likely be very unsupported. 

 

Thanks, 

Taylor

Hi @tayloramy ,

 

all right.

To answer your question:
Yes, you can do that with notebookutils.run.

I just tested it again.

Here's an example:

 

# Here is an Example

result = notebookutils.notebook.run(
    "nb2",                       # Name from Notebooks
    1800,                        # Timeout in seconds
    {"param1": "test"},          # Parameter when you use it
    "12345678-1234-1234-1234-c55a97b0f92a"  # WorkspaceId
)

print(result)

 

And here is the documentation for it.

https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-utilities

 

As far as the Rest API is concerned, I could imagine that it would work.
However, I haven't checked that yet.

 

I hope it helps you.

 

Best regards

Hi @spaceman127

I understand how to run a notebook with both notebookutils and sempy, my question is: Is there a way to do this while also running the notebook in a high concurrency spark pool instead of the default pool? 

 

Hi @tayloramy ,

ok i understand this.

I don't think that will work. You can, of course, have multiple Spark pools in a workspace, but you can't do that with notebookutils.notebook.run().

Fabric works with sessions.
The default pool is always used. I'm not aware of any other way of doing this at the moment.

 

Best regards

This is my understanding as well. 

My goal is to optimize spark resources, and if I'm programatically starting a bunch of notebooks each in their own starter pool, that is not very optimal. 

 

I was hoping someone here would have some magic sauce I could explore. 
I think my approach going forward will be to run the notebooks from a pipeline, and then run the pipelines from my orchestration layer, that way in the pipeline I can set up the high ocncurrency settings for Spark. 

 

I always try to optimize my notebooks as much as possible.
In this case, however, there is little chance of success.

I will continue testing in this area as soon as I have time. If I find anything, I will get back to you.

Helpful resources

Announcements
September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

Top Kudoed Authors