Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
I am trying to use MLFlow to log the results of my model training to do hyperparameter tuning however it is only logging the last val_loss & loss value in the experiment.
Running in a Python 3.11 notebook with tensorflow: 2.18.0 mlflow: 3.8.1 pandas: 2.2.2
with mlflow.start_run() as run:
mlflow.tensorflow.autolog()
history, model = run_model(X_train, X_val)
for epoch, (tr, vl) in enumerate(
zip(history.history["loss"], history.history["val_loss"])
😞
mlflow.log_metric("loss", tr, step=epoch)
mlflow.log_metric("val_loss", vl, step=epoch)
Then this is the only output in the experiment:
Solved! Go to Solution.
Hello @Zoe_Guest the matrix grid in your screenshot shows the latest views only. You can use the following code to get the epoch level details and view on a graph.
import mlflow
from mlflow.tracking import MlflowClient
import pandas as pd
import matplotlib.pyplot as plt
# 1) Get the most recent run in your active experiment (or paste a run_id explicitly)
client = MlflowClient()
exp = mlflow.get_experiment_by_name("fabric-simple-epoch-logging") # <-- use your experiment name
assert exp is not None, "Experiment not found. Check the name used in mlflow.set_experiment()."
runs = client.search_runs(exp.experiment_id, order_by=["attributes.start_time DESC"], max_results=1)
assert runs, "No runs found in this experiment."
run_id = runs[0].info.run_id
print("Using run:", run_id)
# 2) Fetch the full metric history (all steps/epochs)
loss_hist = client.get_metric_history(run_id, "loss_manual")
val_hist = client.get_metric_history(run_id, "val_loss_manual")
# 3) Build a tidy dataframe
df = pd.DataFrame({
"epoch": [m.step for m in loss_hist],
"loss_manual": [m.value for m in loss_hist],
"val_loss_manual": [m.value for m in val_hist],
}).sort_values("epoch")
display(df)
# 4) Plot in-notebook
plt.figure(figsize=(7,4))
plt.plot(df["epoch"], df["loss_manual"], marker="o", label="loss_manual")
plt.plot(df["epoch"], df["val_loss_manual"], marker="o", label="val_loss_manual")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.title("Per-epoch metrics from MLflow")
plt.legend()
plt.grid(True)
plt.show()
# 5) Optional: log the table as an artifact so it’s visible on the run page
csv_path = "per_epoch_metrics.csv"
df.to_csv(csv_path, index=False)
mlflow.log_artifact(csv_path, artifact_path="metrics")
Hi @Zoe_Guest,
I would also take a moment to thank @deborshi_nag , for actively participating in the community forum and for the solutions you’ve been sharing in the community forum. Your contributions make a real difference.
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.
Regards,
Community Support Team.
Hi @Zoe_Guest,
I hope the above details help you fix the issue. If you still have any questions or need more help, feel free to reach out. We are always here to support you.
Regards,
Community Support Team.
Hello @Zoe_Guest
with mlflow.start_run():
# Either comment this out to avoid overlap:
# mlflow.tensorflow.autolog()
history, model = run_model(X_train, X_val)
for epoch, (tr, vl) in enumerate(
zip(history.history["loss"], history.history["val_loss"]), start=1
):
mlflow.log_metric("loss_manual", tr, step=epoch)
mlflow.log_metric("val_loss_manual", vl, step=epoch)
Please note, MLflow stores metrics as (key, step) -> value. If you log the same key at the same step more than once, the last value overwrites prior ones. That’s why mixing autolog (which logs loss/val_loss per epoch) with your manual loop (logging the same keys and step numbers) leads to only one record per step and the Run details view shows just the latest overall value.
Hi Thank you for the response however this did not work I still only have 1 value saved in the experiment.
with mlflow.start_run() as run:
# mlflow.tensorflow.autolog()
history, model = run_model(X_train, X_val)
for epoch, (tr, vl) in enumerate(
zip(history.history["loss"], history.history["val_loss"])
😞
print(epoch, tr, vl)
mlflow.log_metric("loss_manual", tr, step=epoch)
mlflow.log_metric("val_loss_manual", vl, step=epoch)
This is the output of the print which shows different step values
Hello @Zoe_Guest the matrix grid in your screenshot shows the latest views only. You can use the following code to get the epoch level details and view on a graph.
import mlflow
from mlflow.tracking import MlflowClient
import pandas as pd
import matplotlib.pyplot as plt
# 1) Get the most recent run in your active experiment (or paste a run_id explicitly)
client = MlflowClient()
exp = mlflow.get_experiment_by_name("fabric-simple-epoch-logging") # <-- use your experiment name
assert exp is not None, "Experiment not found. Check the name used in mlflow.set_experiment()."
runs = client.search_runs(exp.experiment_id, order_by=["attributes.start_time DESC"], max_results=1)
assert runs, "No runs found in this experiment."
run_id = runs[0].info.run_id
print("Using run:", run_id)
# 2) Fetch the full metric history (all steps/epochs)
loss_hist = client.get_metric_history(run_id, "loss_manual")
val_hist = client.get_metric_history(run_id, "val_loss_manual")
# 3) Build a tidy dataframe
df = pd.DataFrame({
"epoch": [m.step for m in loss_hist],
"loss_manual": [m.value for m in loss_hist],
"val_loss_manual": [m.value for m in val_hist],
}).sort_values("epoch")
display(df)
# 4) Plot in-notebook
plt.figure(figsize=(7,4))
plt.plot(df["epoch"], df["loss_manual"], marker="o", label="loss_manual")
plt.plot(df["epoch"], df["val_loss_manual"], marker="o", label="val_loss_manual")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.title("Per-epoch metrics from MLflow")
plt.legend()
plt.grid(True)
plt.show()
# 5) Optional: log the table as an artifact so it’s visible on the run page
csv_path = "per_epoch_metrics.csv"
df.to_csv(csv_path, index=False)
mlflow.log_artifact(csv_path, artifact_path="metrics")
Thank you for your reply, that works apart from the last part.
mlflow.log_artifact(csv_path, artifact_path="metrics")
is giving the following error:
TypeError: tridentml_artifacts_builder() got an unexpected keyword argument 'tracking_uri'
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 2 | |
| 2 | |
| 1 | |
| 1 | |
| 1 |