Solved: Re: Fabric pipeline - Notebook crashes after 3 hou...

FatimaArshad · ‎02-21-2025

Hi,

I have created a modelling notebook. It contains four models and usually takes 12 hours to run. When I put it in a Fabric Pipeline, it generates this error after 3 hours:

I an unsure of how to solve this.

Edit: Added configurations

nilendraFabric · ‎02-21-2025

thanks for sharing the details

Non-distributed Python training (SARIMAX/GAM) creates single-threaded workloads that don’t leverage Spark’s parallelism

Each notebook call in a pipeline creates a new Spark session (even for simple I/O), causing Livy session throttling.

Nested loops create 1600 variables × 4 models = 6,400 sequential tasks, overwhelming small compute nodes

Some possible solutions

Enable High Concurrency Mode
Reduces Spark session overhead by 70% through session sharing

Use Spark Job Definitions
Queueable batch jobs avoid interactive capacity limits

Restructure the training workflow using Spark’s native distributed processing instead of Python loops. Combine with batch job definitions and proper resource allocation to stay within Fabric’s capacity limits while maintaining throughput

if this is helpful please accept the answer

View solution in original post

v-achippa · ‎02-26-2025

Hi @FatimaArshad,

Thank you for reaching out to Microsoft Fabric Community.

Thank you @nilendraFabric for addressing the issue.

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the super user resolved your issue? or let us know if you need any further assistance.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎03-02-2025

Hi @FatimaArshad,

We wanted to kindly follow up to check if the solution provided by the super user resolved your issue.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

Thanks and regards,

Anjan Kumar Chippa

v-achippa · ‎03-05-2025

Hi @FatimaArshad,

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the super user resolved your issue.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

Thanks and regards,

Anjan Kumar Chippa

nilendraFabric · ‎02-21-2025

thanks for sharing the details

Non-distributed Python training (SARIMAX/GAM) creates single-threaded workloads that don’t leverage Spark’s parallelism

Each notebook call in a pipeline creates a new Spark session (even for simple I/O), causing Livy session throttling.

Nested loops create 1600 variables × 4 models = 6,400 sequential tasks, overwhelming small compute nodes

Some possible solutions

Enable High Concurrency Mode
Reduces Spark session overhead by 70% through session sharing

Use Spark Job Definitions
Queueable batch jobs avoid interactive capacity limits

Restructure the training workflow using Spark’s native distributed processing instead of Python loops. Combine with batch job definitions and proper resource allocation to stay within Fabric’s capacity limits while maintaining throughput

if this is helpful please accept the answer

FatimaArshad · ‎02-21-2025

@nilendraFabric Do you think this is compute issue?

nilendraFabric · ‎02-21-2025

My intial thinking is going for resource and compute constraints. Which SKU you are on, what are the sizes of these models etc. please share further details

FatimaArshad · ‎02-21-2025

Linear Regress, SARIMAX, GAM, XGboost. Training is done on 1600 dependent variables. One loop goes through set of features, second goes through models, and third applies models to all these variables