Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Dear All,
is there any link or tutorial to do machine learning and deploy the models to production using fabric?
1. My datasets are stored in Lakehouse
2. We Trained the model and save it in Fabric
But when applying the script to load the new data coming the lakehouse. It says the
RuntimeError: Unable to get model info: Registered Model with name=component_classification_v3 not found
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, udf
from pyspark.sql.types import ArrayType, IntegerType
import pandas as pd
from transformers import BertTokenizer
import tensorflow as tf
import mlflow
from synapse.ml.predict import MLFlowTransformer
# Initialize Spark session
spark = SparkSession.builder.getOrCreate()
# Load data from your Spark SQL environment or DataFrame
df = spark.sql("SELECT removal_reasons, reliability_tracked FROM lakehouse1.part_removal")
# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenization function
def tokenize_text(text):
tokens = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="np")
return tokens['input_ids'][0].tolist(), tokens['attention_mask'][0].tolist()
# Register UDF for tokenization
@udf(ArrayType(IntegerType()))
def udf_tokenize_input_ids(text):
return tokenize_text(text)[0]
@udf(ArrayType(IntegerType()))
def udf_tokenize_attention_mask(text):
return tokenize_text(text)[1]
# Apply the UDF to add tokenized columns
df = df.withColumn("input_ids", udf_tokenize_input_ids(col("removal_reasons")))
df = df.withColumn("attention_mask", udf_tokenize_attention_mask(col("removal_reasons")))
# Ensure the DataFrame has the correct format
df = df.select("input_ids", "attention_mask", "reliability_tracked")
# Configure MLflow
mlflow.set_tracking_uri("UNKNOWN") # Your MLflow tracking URI here
mlflow.set_experiment("Notebook-1") # Your experiment name here
# Load the model
model = MLFlowTransformer(
inputCols=["input_ids", "attention_mask"], # Your input columns here
outputCol="predictions", # Your new column name here
modelName="component_classification_v3", # Your model name here
modelVersion=1 # Your model version here
)
# Transform the data using the model
predicted_df = model.transform(df)
# Write the predictions to Delta Lake
predicted_df.write.format('delta').mode("overwrite").save("predicted_amos_part_removal") # Your output table filepath here
Regards,
King
Hi @kirah2128 ,
Thanks for using Fabric Community.
Did you got any chance to look into this doc - Machine learning model - Microsoft Fabric | Microsoft Learn
Hope this is helpful. Do let me know incase of further queries.
Hi, the link is not helpful.
What I want to achieve is to deploy now the model. There's a wizard option but that's not gonna work if the input is converted to other data types. in my Case its TensorFlow data type.; I attached the picture below for your ref.
Hi @kirah2128 ,
Can you please check your input data and also the version configuration from your end once?
Do let me know incase of further queries.
This is the model
here is how I supply the ML with the new inputs from lakehouse source.
# Initialize Spark session
spark = SparkSession.builder.getOrCreate()
# Load data from your Spark SQL environment or DataFrame
df = spark.sql("SELECT removal_reasons, reliability_tracked FROM lakehouse1.part_removal")
# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenization function
def tokenize_text(text):
tokens = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="np")
return tokens['input_ids'][0].tolist(), tokens['attention_mask'][0].tolist()
# Register UDF for tokenization
@udf(ArrayType(IntegerType()))
def udf_tokenize_input_ids(text):
return tokenize_text(text)[0]
@udf(ArrayType(IntegerType()))
def udf_tokenize_attention_mask(text):
return tokenize_text(text)[1]
# Apply the UDF to add tokenized columns
df = df.withColumn("input_ids", udf_tokenize_input_ids(col("removal_reasons")))
df = df.withColumn("attention_mask", udf_tokenize_attention_mask(col("removal_reasons")))
# Ensure the DataFrame has the correct format
df = df.select("input_ids", "attention_mask", "reliability_tracked")
Hi @kirah2128 ,
I was finf this link in Youtube - click here
It looks like some similar issue and he did some changes to make the code work.
Before -
After -
Can you please check the video and let me know if it is helpful.
Hello @kirah2128 ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Hi @kirah2128 ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
User | Count |
---|---|
39 | |
10 | |
4 | |
3 | |
2 |
User | Count |
---|---|
48 | |
16 | |
7 | |
6 | |
5 |