Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

We've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now

Reply
Zoe_Guest
Frequent Visitor

MLFlow - Not logging results of all epochs

I am trying to use MLFlow to log the results of my model training to do hyperparameter tuning however it is only logging the last val_loss & loss value in the experiment.

 

Running in a Python 3.11 notebook with tensorflow: 2.18.0 mlflow: 3.8.1 pandas: 2.2.2

 

with mlflow.start_run() as run:
        mlflow.tensorflow.autolog()
        history, model = run_model(X_train, X_val)
        for epoch, (tr, vl) in enumerate(
            zip(history.history["loss"], history.history["val_loss"])
        😞
            mlflow.log_metric("loss", tr, step=epoch)
            mlflow.log_metric("val_loss", vl, step=epoch)

 

Then this is the only output in the experiment:

Zoe_Guest_0-1771493443212.png

 

1 ACCEPTED SOLUTION

Hello @Zoe_Guest the matrix grid in your screenshot shows the latest views only. You can use the following code to get the epoch level details and view on a graph. 

 

import mlflow
from mlflow.tracking import MlflowClient
import pandas as pd
import matplotlib.pyplot as plt

# 1) Get the most recent run in your active experiment (or paste a run_id explicitly)
client = MlflowClient()
exp = mlflow.get_experiment_by_name("fabric-simple-epoch-logging")  # <-- use your experiment name
assert exp is not None, "Experiment not found. Check the name used in mlflow.set_experiment()."
runs = client.search_runs(exp.experiment_id, order_by=["attributes.start_time DESC"], max_results=1)
assert runs, "No runs found in this experiment."
run_id = runs[0].info.run_id
print("Using run:", run_id)

# 2) Fetch the full metric history (all steps/epochs)
loss_hist = client.get_metric_history(run_id, "loss_manual")
val_hist  = client.get_metric_history(run_id, "val_loss_manual")

# 3) Build a tidy dataframe
df = pd.DataFrame({
    "epoch": [m.step for m in loss_hist],
    "loss_manual": [m.value for m in loss_hist],
    "val_loss_manual": [m.value for m in val_hist],
}).sort_values("epoch")

display(df)

# 4) Plot in-notebook
plt.figure(figsize=(7,4))
plt.plot(df["epoch"], df["loss_manual"], marker="o", label="loss_manual")
plt.plot(df["epoch"], df["val_loss_manual"], marker="o", label="val_loss_manual")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.title("Per-epoch metrics from MLflow")
plt.legend()
plt.grid(True)
plt.show()

# 5) Optional: log the table as an artifact so it’s visible on the run page
csv_path = "per_epoch_metrics.csv"
df.to_csv(csv_path, index=False)
mlflow.log_artifact(csv_path, artifact_path="metrics")

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

6 REPLIES 6
v-hjannapu
Community Support
Community Support

Hi @Zoe_Guest,

I would also take a moment to thank @deborshi_nag  , for actively participating in the community forum and for the solutions you’ve been sharing in the community forum. Your contributions make a real difference.
 

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.

Regards,
Community Support Team.

Hi @Zoe_Guest,
I hope the above details help you fix the issue. If you still have any questions or need more help, feel free to reach out. We are always here to support you.


Regards,
Community Support Team.

deborshi_nag
Resident Rockstar
Resident Rockstar

Hello @Zoe_Guest 

  

Use different metric names for your manual logging (e.g., loss_manual, val_loss_manual) and keep steps strictly increasing.
 
with mlflow.start_run():
    # Either comment this out to avoid overlap:
    # mlflow.tensorflow.autolog()

    history, model = run_model(X_train, X_val)

    for epoch, (tr, vl) in enumerate(
        zip(history.history["loss"], history.history["val_loss"]), start=1
    ):
        mlflow.log_metric("loss_manual", tr, step=epoch)
        mlflow.log_metric("val_loss_manual", vl, step=epoch) 

 

Please note, MLflow stores metrics as (key, step) -> value. If you log the same key at the same step more than once, the last value overwrites prior ones. That’s why mixing autolog (which logs loss/val_loss per epoch) with your manual loop (logging the same keys and step numbers) leads to only one record per step and the Run details view shows just the latest overall value.

 
I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

Hi Thank you for the response however this did not work I still only have 1 value saved in the experiment.

    with mlflow.start_run() as run:
        # mlflow.tensorflow.autolog()
        history, model = run_model(X_train, X_val)
        for epoch, (tr, vl) in enumerate(
            zip(history.history["loss"], history.history["val_loss"])
        😞
            print(epoch, tr, vl)
            mlflow.log_metric("loss_manual", tr, step=epoch)
            mlflow.log_metric("val_loss_manual", vl, step=epoch)

This is the output of the print which shows different step values

Zoe_Guest_0-1771510545272.png

 

 

Hello @Zoe_Guest the matrix grid in your screenshot shows the latest views only. You can use the following code to get the epoch level details and view on a graph. 

 

import mlflow
from mlflow.tracking import MlflowClient
import pandas as pd
import matplotlib.pyplot as plt

# 1) Get the most recent run in your active experiment (or paste a run_id explicitly)
client = MlflowClient()
exp = mlflow.get_experiment_by_name("fabric-simple-epoch-logging")  # <-- use your experiment name
assert exp is not None, "Experiment not found. Check the name used in mlflow.set_experiment()."
runs = client.search_runs(exp.experiment_id, order_by=["attributes.start_time DESC"], max_results=1)
assert runs, "No runs found in this experiment."
run_id = runs[0].info.run_id
print("Using run:", run_id)

# 2) Fetch the full metric history (all steps/epochs)
loss_hist = client.get_metric_history(run_id, "loss_manual")
val_hist  = client.get_metric_history(run_id, "val_loss_manual")

# 3) Build a tidy dataframe
df = pd.DataFrame({
    "epoch": [m.step for m in loss_hist],
    "loss_manual": [m.value for m in loss_hist],
    "val_loss_manual": [m.value for m in val_hist],
}).sort_values("epoch")

display(df)

# 4) Plot in-notebook
plt.figure(figsize=(7,4))
plt.plot(df["epoch"], df["loss_manual"], marker="o", label="loss_manual")
plt.plot(df["epoch"], df["val_loss_manual"], marker="o", label="val_loss_manual")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.title("Per-epoch metrics from MLflow")
plt.legend()
plt.grid(True)
plt.show()

# 5) Optional: log the table as an artifact so it’s visible on the run page
csv_path = "per_epoch_metrics.csv"
df.to_csv(csv_path, index=False)
mlflow.log_artifact(csv_path, artifact_path="metrics")

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

Thank you for your reply, that works apart from the last part.

mlflow.log_artifact(csv_path, artifact_path="metrics")

is giving the following error:

TypeError: tridentml_artifacts_builder() got an unexpected keyword argument 'tracking_uri'

 

 

Helpful resources

Announcements
FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Fabric Update Carousel

Fabric Monthly Update - March 2026

Check out the March 2026 Fabric update to learn about new features.

Top Solution Authors