Solved: Re: Run notebooks in parallel

priyankabis · ‎09-02-2024

Hi team

I want to run notebooks in parallel. But i see that the time it takes to run them through pipeline is longer than the time taken to run individually. Is there a setting that needs to be done?

Also is there a way that each notebook doenst spin its own cluster but use the same cluster. The notebooks seem to be queued as well. Why?

Anonymous · ‎09-04-2024

Hi @priyankabis ,

I'm sorry this official documentation above didn't help you, but I think you can enable high concurrency mode. This mode allows multiple notebooks to share a Spark session.

1. Navigate to the Data Engineer/Science section.
2. Select the Spark Compute menu.
3. If the High Concurrency Mode option is disabled, enable the High Concurrency Mode option.

You can also look at this link: Introducing High Concurrency Mode in Notebooks for Data Engineering and Data Science workloads in Mi...

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎09-02-2024

Hi @priyankabis ,

Running notebooks in parallel can sometimes lead to longer execution times due to a variety of factors such as resource contention, overhead of managing parallel tasks, or inefficient configurations.

For your first question, I think first you need to make sure that the cluster has enough resources (CPU, memory) to handle multiple notebooks running in parallel. If resources are limited, tasks may compete for resources and cause delays. Secondly you need to check if the cluster configuration can handle parallel execution efficiently. Running notebooks through a pipeline may incur additional overhead compared to running them individually. This could be due to setup and teardown times, data transfers between stages, or other pipeline management tasks.

For your second question, I think you can Implement a clustering policy that forces all notebooks to use a shared cluster. This helps prevent the creation of multiple clusters and ensures efficient utilization of resources. If laptops are queuing, it may be due to the capacity limitations of the cluster. Increasing the cluster size or optimizing the laptops to run more efficiently can help reduce queuing.

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

priyankabis · ‎09-02-2024

Can you direct me to the page which mentions about clustering policy

Anonymous · ‎09-02-2024

Hi @priyankabis ,

You can check out this document in detail below, which describes how to attach a laptop to a cluster:

Notebook compute resources | Databricks on AWS

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

priyankabis · ‎09-03-2024

Hi

I dont think this documentationi s right for Fabric. This documentation is for Databricks

Anonymous · ‎09-04-2024

Hi @priyankabis ,

I'm sorry this official documentation above didn't help you, but I think you can enable high concurrency mode. This mode allows multiple notebooks to share a Spark session.

1. Navigate to the Data Engineer/Science section.
2. Select the Spark Compute menu.
3. If the High Concurrency Mode option is disabled, enable the High Concurrency Mode option.

You can also look at this link: Introducing High Concurrency Mode in Notebooks for Data Engineering and Data Science workloads in Mi...

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.