Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Solved! Go to Solution.
Hi @ca_solution
You're encountering a Connection refused error during the execution of your PySpark code, which uses SynapseML's LightGBM classifier for hyperparameter tuning within a Spark environment. This error is not due to a coding mistake, but rather a network-level issue within the Spark cluster. Specifically, SynapseML's LightGBM component relies on inter-node communication over TCP ports to coordinate training across distributed workers. If those ports are blocked, unavailable, or misconfigured, the Spark tasks will fail with a java.net.ConnectException, indicating they cannot reach each other. This can happen if the Spark environment (e.g., Azure Synapse, Databricks, HDInsight) has firewall restrictions, lacks sufficient resources (like CPU or memory), or is misconfigured for distributed LightGBM usage. It can also occur if too many parallel tasks are launched at once—exceeding what the cluster can handle—especially during cross-validation. To resolve this, you should ensure that the environment allows node-to-node communication, possibly by opening necessary ports (typically starting at 12400), reducing the level of parallelism, and reviewing Spark cluster logs for more details. Ultimately, this is an infrastructure or environment issue that affects how the LightGBM workers connect across the distributed Spark cluster, and addressing it requires adjusting networking or resource configurations in your cluster setup.
Hi @ca_solution ,
Just wanted to check, if you were able to review the suggestions provided?
If any of the responses has addressed your query, please accept it as a solution so other members can easily find it.
Thank you.
Hi @ca_solution ,
Just wanted to check, if you were able to review the suggestions provided?
Thank You @Poojara_D12 for your detail explaination to the query .Its true that LightGBM in distributed mode can fail if ports between the nodes are not configured correctly hence its a set up issue.
If any of the responses has addressed your query, please accept it as a solution so other members can easily find it.
Thank you.
Hi @ca_solution
You're encountering a Connection refused error during the execution of your PySpark code, which uses SynapseML's LightGBM classifier for hyperparameter tuning within a Spark environment. This error is not due to a coding mistake, but rather a network-level issue within the Spark cluster. Specifically, SynapseML's LightGBM component relies on inter-node communication over TCP ports to coordinate training across distributed workers. If those ports are blocked, unavailable, or misconfigured, the Spark tasks will fail with a java.net.ConnectException, indicating they cannot reach each other. This can happen if the Spark environment (e.g., Azure Synapse, Databricks, HDInsight) has firewall restrictions, lacks sufficient resources (like CPU or memory), or is misconfigured for distributed LightGBM usage. It can also occur if too many parallel tasks are launched at once—exceeding what the cluster can handle—especially during cross-validation. To resolve this, you should ensure that the environment allows node-to-node communication, possibly by opening necessary ports (typically starting at 12400), reducing the level of parallelism, and reviewing Spark cluster logs for more details. Ultimately, this is an infrastructure or environment issue that affects how the LightGBM workers connect across the distributed Spark cluster, and addressing it requires adjusting networking or resource configurations in your cluster setup.
Hi @ca_solution ,
Just wanted to check if you had the opportunity to review the solutions provided?
If the response has addressed your query, please accept it as a solution so other members can easily find it.
Thank You
Hi @ca_solution ,
From the traceback, it’s happening during the .fit() call, meaning Spark is trying to talk to a backend service or cluster node, and something isn’t playing along.
1. Make sure your Spark cluster or Synapse environment is up and fully initialized.If you're using Azure Synapse or a Spark pool, double-check that it's started and accepting jobs.
2.LightGBM in distributed mode can sometimes fail if ports between nodes aren’t open.
3.You’re using parallelism=2, which suggests multiple processes. It's possible that two processes are trying to bind to the same port or resource, causing a conflict.
Try setting dataTransferMode="tcp" , instead of "bulk"—sometimes that resolves strange socket issues.
Run a very small subset of your training data to see if the issue still occurs.
Hope this helps!