Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
ca_solution
New Member

Microsoft fabrics - Default Environment - Ml Model - Connection Refused error

I have been facing connection refused error, didn't understand what's reason.
Any ideas about what's is wrong here ??

My Code:
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from synapse.ml.lightgbm import LightGBMClassifier
from synapse.ml.train import ComputeModelStatistics
import mlflow
from pyspark.ml import Pipeline

 

# Start MLflow experiment
mlflow.set_experiment("HyperparameterTuning")

 

with mlflow.start_run() as run:
# Set experiment tags to indicate hyperparameter tuning
mlflow.set_tag("run_type", "hyperparameter_tuning")
mlflow.set_tag("model_type", "LightGBM")
mlflow.set_tag("experiment_purpose", "prediction_test")

 

# Define the model
lgbm = LightGBMClassifier(
labelCol="label",
featuresCol="features",
featuresShapCol="shapValues",
dataTransferMode="bulk",
verbosity=1,
boostingType="gbdt",
maxBin=255,
objective="binary"
)

 

# Define parameter grid
paramGrid = ParamGridBuilder() \
.addGrid(lgbm.numIterations, [50, 100, 200]) \
.addGrid(lgbm.learningRate, [0.01, 0.05, 0.1]) \
.addGrid(lgbm.numLeaves, [31, 64, 128]) \
.addGrid(lgbm.isUnbalance, [True, False]) \
.build()

 

# Log the parameter grid as JSON
mlflow.log_dict({"param_grid": [
{"numIterations": [50, 100, 200]},
{"learningRate": [0.01, 0.05, 0.1]},
{"numLeaves": [31, 64, 128]},
{"isUnbalance": [True, False]}
]}, "param_grid.json")

 

# Setup CrossValidator
cv = CrossValidator(
estimator=lgbm,
estimatorParamMaps=paramGrid,
evaluator=evaluator,
numFolds=3,
parallelism=2
)

 

# Fit the model
cvModel = cv.fit(final_train)

 

# Log the best model
mlflow.spark.log_model(cvModel.bestModel, "lightgbm_best_model")

 

# Log best hyperparameters
best_params = cvModel.bestModel.extractParamMap()
best_params_dict = {param.name: best_params[param] for param in best_params}
mlflow.log_params(best_params_dict)

 

# Make predictions
predictions = cvModel.bestModel.transform(final_test)

 

# Calculate detailed statistics
metrics_df = ComputeModelStatistics(
evaluationMetric="classification",
labelCol="label",
scoredLabelsCol="prediction",
scoresCol="probability"
).transform(predictions)

 

metrics = metrics_df.first().asDict()

 

# Log evaluation metrics
mlflow.log_metrics({
"Accuracy": metrics["accuracy"],
"AUC": metrics["AUC"],
"Precision": metrics["precision"],
"Recall": metrics["recall"]
})

 

# Log confusion matrix
confusion_matrix = metrics["confusion_matrix"].toArray().tolist()
mlflow.log_dict({"confusion_matrix": confusion_matrix}, "confusion_matrix.json")

 

print("Hyperparameter Tuning Completed. Best Params and Metrics Logged to MLflow.")




Py4JJavaError: An error occurred while calling o37078.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 36.0 failed 4 times, most recent failure: Lost task 10.3 in stage 36.0 (TID 3493😞
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
1 REPLY 1
v-sdhruv
Community Support
Community Support

Hi @ca_solution ,

From the traceback, it’s happening during the  .fit() call, meaning Spark is trying to talk to a backend service or cluster node, and something isn’t playing along.

1. Make sure your Spark cluster or Synapse environment is up and fully initialized.If you're using Azure Synapse or a Spark pool, double-check that it's started and accepting jobs.
2.LightGBM in distributed mode can sometimes fail if ports between nodes aren’t open.
3.You’re using parallelism=2, which suggests multiple processes. It's possible that two processes are trying to bind to the same port or resource, causing a conflict.

Try setting dataTransferMode="tcp" , instead of "bulk"—sometimes that resolves strange socket issues.

Run a very small subset of your training data to see if the issue still occurs.

Hope this helps!

 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.