Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
MJWilliams
Regular Visitor

LightGBMRanker

Hello,

I've been wrangling with LightGBMRanker for a few days now, I seem to have issues resolving a "_Column" error from LightGBM and wondered if anyone had come across this?
To try and get to the bottom of it, I've been testing with a very simple script to try and establish where the issue lies but can't seem to get to the bottom of it. If anyone has any insights, I'd be grateful? Thanks

pip install synapseml

SynapseML Version: 1.0.10
LightGBM Version: 4.3.0


from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.ml.feature import VectorAssembler
from synapse.ml.lightgbm import LightGBMRanker
import mlflow

# Create Spark session
spark = SparkSession.builder.appName("LightGBM Fix").getOrCreate()

# Sample Data
data = [
    (1, 101, 1.0, 1.0, 0.5),
    (1, 102, 0.0, 0.3, 0.2),
    (2, 103, 1.0, 0.8, 0.7),
    (2, 104, 0.0, 0.1, 0.4),
]
df = spark.createDataFrame(data, ["group", "id", "label", "feature1", "feature2"])

# Step 1: Assemble features into a Vector column
vector_assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
df_transformed = vector_assembler.transform(df)

# Step 2: Ensure final schema
df_final = df_transformed.select(
    col("group").cast("int"),
    col("id").cast("int"),
    col("label").cast("double"),
    col("features")  # 🚀 Keep features as VectorUDT (Do NOT convert to ArrayType)
)

# Verify Schema Before Training
df_final.printSchema()
df_final.show(5, truncate=False)

# Step 3: Train LightGBM Ranker
ranker = LightGBMRanker(
    labelCol="label",
    featuresCol="features",  # 🚀 Use Vector column (not array)
    groupCol="group",
    objective="lambdarank",
    metric="ndcg"
)

# Fit Model
model = ranker.fit(df_final)

print(" LightGBMRanker successfully trained!")


------------------------------------------------------

Py4JJavaError: An error occurred while calling o11667.fit. : java.lang.Exception: Dataset create from samples call failed in LightGBM with error: Feature (Column_) appears more than one time. at com.microsoft.azure.synapse.ml.lightgbm.LightGBMUtils$.validate(LightGBMUtils.scala:18) at com.microsoft.azure.synapse.ml.lightgbm.dataset.ReferenceDatasetUtils$.createReferenceDatasetFromSample(ReferenceDatasetUtils.scala:47) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.calculateRowStatistics(LightGBMBase.scala:545) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.trainOneDataBatch(LightGBMBase.scala:425) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.$anonfun$train$2(LightGBMBase.scala:62) at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:162) at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:159) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRanker.logVerb(LightGBMRanker.scala:26) at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit(SynapseMLLogging.scala:152) at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit$(SynapseMLLogging.scala:151) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRanker.logFit(LightGBMRanker.scala:26) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.train(LightGBMBase.scala:64) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.train$(LightGBMBase.scala:36) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRanker.train(LightGBMRanker.scala:26) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRanker.train(LightGBMRanker.scala:26) at org.apache.spark.ml.Predictor.fit(Predictor.scala:114) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829)

1 ACCEPTED SOLUTION
v-prasare
Community Support
Community Support

Hi @MJWilliams ,Thanks for reaching out to MS Fabric community support.

 

If this needs immediate attention, In this scenario i suggest you to raise a support ticket here while waiting for other community members to answer. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

 

 

 

Thanks,

Prashanth Are

 

View solution in original post

2 REPLIES 2
v-prasare
Community Support
Community Support

Hi @MJWilliams,

 

We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?

If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.

Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.

 

Thank you for your understanding and participation.

v-prasare
Community Support
Community Support

Hi @MJWilliams ,Thanks for reaching out to MS Fabric community support.

 

If this needs immediate attention, In this scenario i suggest you to raise a support ticket here while waiting for other community members to answer. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

 

 

 

Thanks,

Prashanth Are

 

Helpful resources

Announcements
May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.