Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Srisakthi
Resolver II
Resolver II

Notebook activity in Data Pipeline

Hello All,

 

In my workspace I have 5 notebooks in a folder, which needs to be triggered from Fabric Data Pipeline sequentially.  But I can select only one notebook at a time in a pipeline. is there anyway to point the folder and execute those notebook one by one instead of adding multiple notebook activity?

 

Regards,

Srisakthi

1 ACCEPTED SOLUTION

As you stated that the notebooks need to be executed in a particular sequence,so you need to maintain the sequnce config somewhere.

In case if the requirement is to execute the notebooks in any sequnce but sequentailly, you can try the below route (which iI have not tested myself)

Get list of all notebooks within the workspace via REST API leveraging the web activity
pass that array to foreach activity




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

View solution in original post

6 REPLIES 6
Srisakthi
Resolver II
Resolver II

Hi @spencer_sa  , @NandanHegde ,

 

Thanks for your reply. In both the cases i have hard code my notebook names, which i dont want to do. It should dynamically pick it up.

 

Regards,

Srisakthi

As you stated that the notebooks need to be executed in a particular sequence,so you need to maintain the sequnce config somewhere.

In case if the requirement is to execute the notebooks in any sequnce but sequentailly, you can try the below route (which iI have not tested myself)

Get list of all notebooks within the workspace via REST API leveraging the web activity
pass that array to foreach activity




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

There is another path that I think is closer to what the original poster is seeking.

 

You can use the file naming schema to sequence notebooks, which eliminates the need to maintain a configuration details.  This approach also seemlessly scales across Dev/Prod deployement landscape

 

General design is to build a orchestration notebook:

 

1. Use notebookutils to get all objects in WS

2. Filter to Notebooks objects

3. order by sequence 

4. Build out your DAG

5. Call run multiple

 

This is the approach we took and it works well.  We bundle our notebooks into phases (ie:  _p01, _p02).  We also put each phase in a folder but unfortuatly at this time that attribute isnt retuened by notbook utils so its just to help my team stay organized.

 

 

Here is stubbed out py code for each step.  Obv with this approach you need to repeat steps 3-5 for how every many phases you wanted to define.  But if you want even more flexibility you could put the notebook Dependencies in the description of each notebook which is an attribute returned by notebookutils. Then update you DAG code to include the dependencies

 

1 . 

# Get all objects in ws
artifacts_list = notebookutils.notebook.list()
 
2.
# Filter items to include only notebooks
notebooks = [
    item for item in artifacts_list
    if item['type'] == 'Notebook'
]
 
3.
# Filter items to include only phase01 notebooks
p01_notebooks = [
    item for item in artifacts_list
    if '_p01' in item['displayName']
]
 
4.
DAG_Array = {
    "activities": [
        {
            "name": item['displayName'],
            "path": item['displayName'],
            "timeoutPerCellInSeconds": 1800 # 90 is Default
        }
        for item in p01_notebooks
    ],
    "timeoutInSeconds": 43200,  # max timeout for the entire DAG 43200 (12 hrs) is Default
    "concurrency": 5  # max number of notebooks to run concurrently 50 is Default
}
 
5.
#Run DAG Array
notebookutils.notebook.runMultiple(DAG_Array)

Hi @NandanHegde ,

 

Let me give try for web activity.

 

Thanks,

Srisakthi

NandanHegde
Super User
Super User

You can design a meta data driven framework rather than multiple notebook activities as you have the option of callng the notebook dynamically:

NandanHegde_0-1732161001644.png

 

So you can follow the below frame :

1) Create a parameter called NotebookSeq

and it would have notebook values semicolon seperated as below :

 

abc;def,ghi

 

2) use for each activity and make it sequential

and expression would be @split(parameter,';')

 

3) within for each, use the notebook activity and in notebook add dynamic expression as @Item()

 

So whenever there are new notebooks or deletions or change of sequence, you just need to update the parameter value




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com
spencer_sa
Super User
Super User

Folders in workspaces are currently pretty much cosmetic - they don't persisted in Source Control (a separate concern).
If you didn't want to individually run notebooks you have a number of options, but most rely on having a list/array/table of notebook names;

1) Maintain a table of notebooks, perform a Lookup Activity to load them into an array, have a ForEach Activity to cycle over each notebook name executing each one.

2) Have a notebook that has this info to hand (either table, hardcoded, or API call/sempy_labs) and either uses .run/.runMultiple to excute them or passes them as a JSON string to the calling pipeline which them runs using a ForEach as per option 1) above.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.