Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
cfccai
Advocate II
Advocate II

failed barrier resultstage error when training a XGBoost model

hello

I come accross an issue when using notebook pyspark to train a XGBoost model.

Code snippet:

#load sample data; transform date type into integer for vector
data = spark.createDataFrame(df)
data = data.withColumn("Year", F.year("DateID").cast(IntegerType()))  
data = data.withColumn("Month", F.month("DateID").cast(IntegerType()))  

# Assemble features into a single vector column
assembler = VectorAssembler(inputCols=["Year","Month"], outputCol="features")
data = assembler.transform(data)

train, test = data.randomSplit([0.7, 0.3], seed=123)
# Initialize XGBoost regressor
xgb_regressor = SparkXGBRegressor(label_col="SalesOrderAmount", num_round=10)
# Train the model
model_xgbregressor = xgb_regressor.fit(train)
 
When the code running into this statement, it reports an error:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.scheduler.BarrierJobRunWithDynamicAllocationException: [SPARK-24942]: Barrier execution mode does not support dynamic resource allocation for now. You can disable dynamic resource allocation by setting Spark conf "spark.dynamicAllocation.enabled" to "false".
 
So I went to workspace setting and turn off the dynamicAllocation. In addtion, I choose the nodes size as X-large.
cfccai_0-1724840320160.png

 

 Rerun the code, I got below different error msg:
job aborted due to stage failure: could not recover from a failed barrier resultstage. most recent failure reason: stage failed because barrier task resulttask(13, 0) finished unsuccessfully. 
 
I get lost here. Can someone help? Thanks a lot.
2 REPLIES 2
MKinsight
Frequent Visitor

Anyone figured this out?

I've got exactly these 2 errors in the same order. It seems like XGBoost should run on Fabric. It also seems like Fabric doesn't support it.. Msft Fabric docs don't list xgboost models in the training guides, at least now. There's SparkML but these are simpler models in it, also as of now, but seems like SparkML will be the go-to lib. We still need a working solution, please.

v-shex-msft
Community Support
Community Support

HI @cfccai,

It seems like you turned off the ‘dynamic allocation’ option but the existed pool resource not able to handle with current model.  Have you tried to reduce the sample data amount or manually modify the environment compute setting to use more resource in these operations?

Compute management in Fabric environments - Microsoft Fabric | Microsoft Learn

Spark pool node size:

Apache Spark compute for Data Engineering and Data Science - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors