Solved: Notebook activity in Data Pipeline

Srisakthi · ‎11-20-2024

Hello All,

In my workspace I have 5 notebooks in a folder, which needs to be triggered from Fabric Data Pipeline sequentially. But I can select only one notebook at a time in a pipeline. is there anyway to point the folder and execute those notebook one by one instead of adding multiple notebook activity?

Regards,

Srisakthi

NandanHegde · ‎11-20-2024

As you stated that the notebooks need to be executed in a particular sequence,so you need to maintain the sequnce config somewhere.

In case if the requirement is to execute the notebooks in any sequnce but sequentailly, you can try the below route (which iI have not tested myself)

Get list of all notebooks within the workspace via REST API leveraging the web activity
pass that array to foreach activity

----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

View solution in original post

Srisakthi · ‎11-20-2024

Hi @spencer_sa , @NandanHegde ,

Thanks for your reply. In both the cases i have hard code my notebook names, which i dont want to do. It should dynamically pick it up.

Regards,

Srisakthi

NandanHegde · ‎11-20-2024

As you stated that the notebooks need to be executed in a particular sequence,so you need to maintain the sequnce config somewhere.

In case if the requirement is to execute the notebooks in any sequnce but sequentailly, you can try the below route (which iI have not tested myself)

Get list of all notebooks within the workspace via REST API leveraging the web activity
pass that array to foreach activity

----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

zarb · ‎12-11-2024

There is another path that I think is closer to what the original poster is seeking.

You can use the file naming schema to sequence notebooks, which eliminates the need to maintain a configuration details. This approach also seemlessly scales across Dev/Prod deployement landscape

General design is to build a orchestration notebook:

1. Use notebookutils to get all objects in WS

2. Filter to Notebooks objects

3. order by sequence

4. Build out your DAG

5. Call run multiple

This is the approach we took and it works well. We bundle our notebooks into phases (ie: _p01, _p02). We also put each phase in a folder but unfortuatly at this time that attribute isnt retuened by notbook utils so its just to help my team stay organized.

Here is stubbed out py code for each step. Obv with this approach you need to repeat steps 3-5 for how every many phases you wanted to define. But if you want even more flexibility you could put the notebook Dependencies in the description of each notebook which is an attribute returned by notebookutils. Then update you DAG code to include the dependencies

1 .

# Get all objects in ws

artifacts_list = notebookutils.notebook.list()

2.

# Filter items to include only notebooks

notebooks = [

item for item in artifacts_list

if item['type'] == 'Notebook'

]

3.

# Filter items to include only phase01 notebooks

p01_notebooks = [

item for item in artifacts_list

if '_p01' in item['displayName']

]

4.

DAG_Array = {

"activities": [

{

"name": item['displayName'],

"path": item['displayName'],

"timeoutPerCellInSeconds": 1800 # 90 is Default

}

for item in p01_notebooks

],

"timeoutInSeconds": 43200, # max timeout for the entire DAG 43200 (12 hrs) is Default

"concurrency": 5 # max number of notebooks to run concurrently 50 is Default

}

5.

#Run DAG Array

notebookutils.notebook.runMultiple(DAG_Array)

Srisakthi · ‎11-24-2024

Hi @NandanHegde ,

Let me give try for web activity.

Thanks,

Srisakthi

NandanHegde · ‎11-20-2024

You can design a meta data driven framework rather than multiple notebook activities as you have the option of callng the notebook dynamically:

So you can follow the below frame :

1) Create a parameter called NotebookSeq

and it would have notebook values semicolon seperated as below :

abc;def,ghi

2) use for each activity and make it sequential

and expression would be @split(parameter,';')

3) within for each, use the notebook activity and in notebook add dynamic expression as @Item()

So whenever there are new notebooks or deletions or change of sequence, you just need to update the parameter value

----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

spencer_sa · ‎11-20-2024

Folders in workspaces are currently pretty much cosmetic - they don't persisted in Source Control (a separate concern).
If you didn't want to individually run notebooks you have a number of options, but most rely on having a list/array/table of notebook names;

1) Maintain a table of notebooks, perform a Lookup Activity to load them into an array, have a ForEach Activity to cycle over each notebook name executing each one.

2) Have a notebook that has this info to hand (either table, hardcoded, or API call/sempy_labs) and either uses .run/.runMultiple to excute them or passes them as a JSON string to the calling pipeline which them runs using a ForEach as per option 1) above.

Notebook activity in Data Pipeline

Helpful resources

Fabric Community Update - August 2025

Huge last-minute discounts for FabCon Vienna from September 15-18, 2025

Notebook activity in Data Pipeline

Helpful resources

Fabric Community Update - August 2025