Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Ulrich
New Member

ML model in Fabric: Get prediction probabilities

Hi! I have saved my ML-model as an model in Fabric and import it to my notebook using the code below. How can I get the probability for each predticion instead of 0 or 1?

 

import mlflow
from synapse.ml.predict import MLFlowTransformer

#df = spark.read.format("delta").load()
df = spark.sql("SELECT * from lakehouse.prep_data_score")

model = MLFlowTransformer(
    inputCols=["all inpute columns"], # Your input columns here
    outputCol="predictions", # Your new column name here
    modelName="churn_model", # Your model name here
    modelVersion=5 # Your model version here
)
df = model.transform(df)

df_selection = df.select("MED_KEY", "predictions")
df_selection.write.format("delta").mode("overwrite").saveAsTable("lakehouse.member_scores")

#also save as csv
df_selection_pd = df_selection.toPandas()
df_selection_pd.to_csv("", index=False)
1 ACCEPTED SOLUTION
nilendraFabric
Continued Contributor
Continued Contributor

Hello @Ulrich 

 

The key idea is to create a custom PyFunc model that calls `predict_proba` or an equivalent function on your base model.

 

You can make MLFlowTransformer return probabilities by packaging a model whose predict function itself outputs probabilities, rather than just class labels. In other words, if the underlying model supports something like `predict_proba`, you need to ensure that the MLflow model’s prediction method calls that instead of `predict` when it runs.
One way to do this is to define a custom PyFunc model that wraps your existing classifier and overrides its predict method to invoke `predict_proba`. For a scikit-learn model, for example, you could do something like:

import mlflow.pyfunc
import mlflow.sklearn
import sklearn
from sklearn.base import BaseEstimator

class ProbaWrapper(mlflow.pyfunc.PythonModel):
def load_context(self, context):
import joblib
# Load the underlying model (scikit-learn, XGBoost, etc.)
self.model = mlflow.sklearn.load_model(context.artifacts["base_model"])

def predict(self, context, model_input):
# Return probability outputs instead of classes
return self.model.predict_proba(model_input)

# Train or load your existing model (e.g. a scikit-learn classifier).
# Then save it in MLflow with a 'base_model' artifact, wrapping it in ProbaWrapper:

with mlflow.start_run():
mlflow.pyfunc.log_model(
artifact_path="proba_model",
python_model=ProbaWrapper(),
artifacts={"base_model": "<path_or_registered_model_reference>"},
)

 

Register that model in Fabric, then use the MLFlowTransformer just as before (pointing `modelName` and `modelVersion` to this custom PyFunc model). The result of `model.transform(df)` will now be per-class probabilities instead of 0/1 predictions

 

 

Hope this helps

View solution in original post

4 REPLIES 4
v-ssriganesh
Community Support
Community Support

Hello @Ulrich,
Thank you for posting your query in microsoft fabric community forum.

 

Upon investigating your concern, we found that the code you are using appears correct. However, to obtain the probabilities for each prediction, please make the following modification:
Instead of using:
df_selection = df.select("MED_KEY", "predictions")
we recommend replacing it with one of the following:

  • df_selection = df.select("MED_KEY", "predictions.probability") Or else
  • df_selection = df.select("MED_KEY", "predictions.probabilities")

If this helps, then please Accept it as a solution and dropping a "Kudos" so other members can find it more easily.


Thank you.

Thank you for your answer. But I don't think this will work since predictions is a column with a value and can't be called like that. 

nilendraFabric
Continued Contributor
Continued Contributor

Hello @Ulrich 

 

The key idea is to create a custom PyFunc model that calls `predict_proba` or an equivalent function on your base model.

 

You can make MLFlowTransformer return probabilities by packaging a model whose predict function itself outputs probabilities, rather than just class labels. In other words, if the underlying model supports something like `predict_proba`, you need to ensure that the MLflow model’s prediction method calls that instead of `predict` when it runs.
One way to do this is to define a custom PyFunc model that wraps your existing classifier and overrides its predict method to invoke `predict_proba`. For a scikit-learn model, for example, you could do something like:

import mlflow.pyfunc
import mlflow.sklearn
import sklearn
from sklearn.base import BaseEstimator

class ProbaWrapper(mlflow.pyfunc.PythonModel):
def load_context(self, context):
import joblib
# Load the underlying model (scikit-learn, XGBoost, etc.)
self.model = mlflow.sklearn.load_model(context.artifacts["base_model"])

def predict(self, context, model_input):
# Return probability outputs instead of classes
return self.model.predict_proba(model_input)

# Train or load your existing model (e.g. a scikit-learn classifier).
# Then save it in MLflow with a 'base_model' artifact, wrapping it in ProbaWrapper:

with mlflow.start_run():
mlflow.pyfunc.log_model(
artifact_path="proba_model",
python_model=ProbaWrapper(),
artifacts={"base_model": "<path_or_registered_model_reference>"},
)

 

Register that model in Fabric, then use the MLFlowTransformer just as before (pointing `modelName` and `modelVersion` to this custom PyFunc model). The result of `model.transform(df)` will now be per-class probabilities instead of 0/1 predictions

 

 

Hope this helps

Thank you! This will work. 

Another way if I dont want to use wrappers is this solution:

 

import pandas as pd
import mlflow.sklearn

df = spark.sql("SELECT * from lakehouse.prep_data_score")
#df = df.limit(100)

df_pd = df.toPandas()
med_key = df_pd.pop("MED_KEY")

# Specify the model's path in the MLflow registry
model_uri = "models:/churn_model/5"  # Model name and version

# Load the model
model = mlflow.sklearn.load_model(model_uri)
features = model.feature_names_in_

predictions = model.predict_proba(df_pd[features])

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors
Top Kudoed Authors