Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
DennesTorres
Impactful Individual
Impactful Individual

Parallel Execution

Hi,

This is multiple questions in one....

I know we have two methods of parallel executions, one is the mssparkutils.runmultiple and the other is the for/each in a pipeline.

Am I correct in assuming the mssparkutils.runmultiple uses the high concurrency mode, which needs to be enabled in the workspace, while the pipelines are not capable to do so yet?

Both of them can receive errors related to the capacity. I'm testing with a pipeline and 5 parallel executions already generates errors in my environment. How can we calculate how many parallel executions are possible to make in each of these two scenarios ?

Thank you in advance !

2 ACCEPTED SOLUTIONS
govindarajan_d
Solution Supplier
Solution Supplier

Hi @DennesTorres,

 

High concurrency mode in Spark allows you to share the Spark Compute to execute notebooks in an instant fashion instead of waiting to spin up new compute for each notebook. The runMultiple run uses the compute engine's multi-threading to run the different notebooks. In a way, both of them help achieve high concurrency using the same compute power.

 

High concurrency mode that you enable while using Spark is more for interactive approach at this moment. So let's say you are parallely working in many notebooks and you want to share the same compute so that you can start instantly as well as save costs, high concurrency mode is the way to go. For doing the same in code, you are using runMultiple which uses the same approach. 

 

It is true that pipelines don't do concurrent execution using the same session. But it is in the roadmap: https://learn.microsoft.com/en-us/fabric/release-plan/data-engineering#concurrency

 

So for now, as you mentioned, we can use pipeline to run a notebook that uses runMultiple to run different notebooks to achieve concurrency.

 

View solution in original post

Hi,

I got an explanation from the support about the never ending session. 

According to them, the runmultiple DAG should never be used with more than 50 activities. Strange things could happen if we try, such as the session never ending.

Kind Regards,

Dennes

View solution in original post

12 REPLIES 12
MKassem
Frequent Visitor

Hey @DennesTorres

try to add timeout arg to the activity 

 

{
            "name": "Notebook",
            "path": "Notebook",
            "retry": 3,
            "args": {'useRootDefaultLakehouse': True},
            "timeoutPerCellInSeconds": 1200
        }

 

govindarajan_d
Solution Supplier
Solution Supplier

Hi @DennesTorres,

 

High concurrency mode in Spark allows you to share the Spark Compute to execute notebooks in an instant fashion instead of waiting to spin up new compute for each notebook. The runMultiple run uses the compute engine's multi-threading to run the different notebooks. In a way, both of them help achieve high concurrency using the same compute power.

 

High concurrency mode that you enable while using Spark is more for interactive approach at this moment. So let's say you are parallely working in many notebooks and you want to share the same compute so that you can start instantly as well as save costs, high concurrency mode is the way to go. For doing the same in code, you are using runMultiple which uses the same approach. 

 

It is true that pipelines don't do concurrent execution using the same session. But it is in the roadmap: https://learn.microsoft.com/en-us/fabric/release-plan/data-engineering#concurrency

 

So for now, as you mentioned, we can use pipeline to run a notebook that uses runMultiple to run different notebooks to achieve concurrency.

 

Hi,

Thank you. This matches with my findings as well, but I was a bit confuse because I found articles talking about adjusting the session DOP for the runmultiple, which didn't make sense. For the runmultiple, biggest is the pool, the better.

On the other hand, on my tests with runmultiple, I had two problems: First, it only reaches the maximum resources made available close to the "end" of the script, and I don't understand why. I don't know what to look into the logs to find the reason and fix it.

Second, it never ends. It continues to run continously for hours until I cancel the session. Am I missing some end statement? I'm using mssparkutil.notebook.exit .

The image below illustrates the problems:

DennesTorres_0-1707393897542.png



Thank you in advance!

Dennes

Hi @DennesTorres,

 

By going to spark history server, we can look at the different types of logs. 

govindarajan_d_0-1707395373855.png

 

And then going to each stage and see how each partition gets executed.

 

govindarajan_d_1-1707395547380.png

 

Debugging a spark application for performance is going to be lot harder, but this might be a good place to start. 

 

Do you use notebook.exit in all the notebooks?

 

Hi,

I managed to get a new error message !!!

After I decided to join the code blocks again and make one more execution, finally, for the first time, the runMultiple listed the status of the execution of the tasks, as many articles said it should do.

Some (few) tasks executed sucessfully, but most of the tasks got the following error "Fetch notebook content for 'notebook name here' failed with exception: Timeout waiting for connection from pool."

I'm not sure what's the cause, or how to solve this.

Kind Regards,

Dennes

Hi,

Yes, all the notebooks use notebook.exit.

I tried something different: I created a new code block (there was only one before). I left the first code block with the DAG creation, and on the second code block I included one single statement - the runMultiple.

The result was absolutely terrible. First, it allocated only 1 executor (8 instances). When it was on a single code block it went up to 40 instances. 

Second, the exeuction "hang" around job number 450. When it was on a single code block it hangs around 950. 

I have checked the spark UI, logs, but I can't figure out the meaning/reason for most of it.

Kind Regards,

Dennes

Hi @DennesTorres ,

 

I will need more information from spark UI to debug. Usually we see the Spark DAG to understand what is happening under each job and try to figure out the best way to resolve it. 

Hi,

 

When you say Spark DAG, is it the one I generated as parameter, or something I should extract from the spark UI?

Kind Regards,

 

Dennes

Hi @DennesTorres,

 

It is the one that we get from Spark UI. This is the one I am talking about. This provides a way to understand how exactly Spark executes the query and where it is taking too long. 

govindarajan_d_0-1707491858544.png

 

Hi @DennesTorres 
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi,

No, I don't have a solution yet. I opened a support case(2402080050003234), but any help is welcome.

 

The problem about the DAG on the Spark UI is that I'm feeding the runMultiple with more than 1000 activities. The execution gets stuck sometimes around 950, sometimes more, I can see this in the history, get into the spark ui for each one of them and each one of them has a DAG.

As a result, I get lost about which execution/DAG I should look for.

Kind Regards,

Dennes

 

Hi,

I got an explanation from the support about the never ending session. 

According to them, the runmultiple DAG should never be used with more than 50 activities. Strange things could happen if we try, such as the session never ending.

Kind Regards,

Dennes

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.