How to do the Machine Model application in Fabric?

kirah2128 · ‎05-14-2024

Dear All,

is there any link or tutorial to do machine learning and deploy the models to production using fabric?

1. My datasets are stored in Lakehouse

2. We Trained the model and save it in Fabric

But when applying the script to load the new data coming the lakehouse. It says the

RuntimeError: Unable to get model info: Registered Model with name=component_classification_v3 not found

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, udf
from pyspark.sql.types import ArrayType, IntegerType
import pandas as pd
from transformers import BertTokenizer
import tensorflow as tf
import mlflow
from synapse.ml.predict import MLFlowTransformer

# Initialize Spark session
spark = SparkSession.builder.getOrCreate()

# Load data from your Spark SQL environment or DataFrame
df = spark.sql("SELECT removal_reasons, reliability_tracked FROM lakehouse1.part_removal")

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenization function
def tokenize_text(text):
    tokens = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="np")
    return tokens['input_ids'][0].tolist(), tokens['attention_mask'][0].tolist()

# Register UDF for tokenization
@udf(ArrayType(IntegerType()))
def udf_tokenize_input_ids(text):
    return tokenize_text(text)[0]

@udf(ArrayType(IntegerType()))
def udf_tokenize_attention_mask(text):
    return tokenize_text(text)[1]

# Apply the UDF to add tokenized columns
df = df.withColumn("input_ids", udf_tokenize_input_ids(col("removal_reasons")))
df = df.withColumn("attention_mask", udf_tokenize_attention_mask(col("removal_reasons")))

# Ensure the DataFrame has the correct format
df = df.select("input_ids", "attention_mask", "reliability_tracked")

# Configure MLflow
mlflow.set_tracking_uri("UNKNOWN")  # Your MLflow tracking URI here
mlflow.set_experiment("Notebook-1")  # Your experiment name here

# Load the model
model = MLFlowTransformer(
    inputCols=["input_ids", "attention_mask"],  # Your input columns here
    outputCol="predictions",  # Your new column name here
    modelName="component_classification_v3",  # Your model name here
    modelVersion=1  # Your model version here
)

# Transform the data using the model
predicted_df = model.transform(df)

# Write the predictions to Delta Lake
predicted_df.write.format('delta').mode("overwrite").save("predicted_amos_part_removal")  # Your output table filepath here

Regards,

King

Anonymous · ‎05-14-2024

Hi @kirah2128 ,

Thanks for using Fabric Community.
Did you got any chance to look into this doc - Machine learning model - Microsoft Fabric | Microsoft Learn

Hope this is helpful. Do let me know incase of further queries.

kirah2128 · ‎05-14-2024

Hi, the link is not helpful.

What I want to achieve is to deploy now the model. There's a wizard option but that's not gonna work if the input is converted to other data types. in my Case its TensorFlow data type.; I attached the picture below for your ref.

Anonymous · ‎05-15-2024

Hi @kirah2128 ,

Can you please check your input data and also the version configuration from your end once?

Do let me know incase of further queries.

kirah2128 · ‎05-15-2024

This is the model

here is how I supply the ML with the new inputs from lakehouse source.

# Initialize Spark session
spark = SparkSession.builder.getOrCreate()

# Load data from your Spark SQL environment or DataFrame
df = spark.sql("SELECT removal_reasons, reliability_tracked FROM lakehouse1.part_removal")

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenization function
def tokenize_text(text):
    tokens = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="np")
    return tokens['input_ids'][0].tolist(), tokens['attention_mask'][0].tolist()

# Register UDF for tokenization
@udf(ArrayType(IntegerType()))
def udf_tokenize_input_ids(text):
    return tokenize_text(text)[0]

@udf(ArrayType(IntegerType()))
def udf_tokenize_attention_mask(text):
    return tokenize_text(text)[1]

# Apply the UDF to add tokenized columns
df = df.withColumn("input_ids", udf_tokenize_input_ids(col("removal_reasons")))
df = df.withColumn("attention_mask", udf_tokenize_attention_mask(col("removal_reasons")))

# Ensure the DataFrame has the correct format
df = df.select("input_ids", "attention_mask", "reliability_tracked")

Anonymous · ‎05-16-2024

Hi @kirah2128 ,

I was finf this link in Youtube - click here
It looks like some similar issue and he did some changes to make the code work.

Before -

After -

Can you please check the video and let me know if it is helpful.

Anonymous · ‎05-17-2024

Hello @kirah2128 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Anonymous · ‎05-20-2024

Hi @kirah2128 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

How to do the Machine Model application in Fabric?

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

New Offer! Become a Certified Fabric Data Engineer

How to do the Machine Model application in Fabric?

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025