Solved: Using Pycaret for Random Forest Classifcation (Emp...

userpowerbiii12 · ‎07-16-2023

I'm trying to integrate my machine learning model (Random Forest) I did in Jupyter Notebook inside power BI. Using gridsearch , and after testing multiple algorithms, it turned out that the Random Forest was the most performant one for my dataset.

So I saved my model as a PKL file (rf_ins.PKL) and then I imported the model inside power BI using the Python visual. I used the following code:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

# dataset = pandas.DataFrame(average_montly_hours, bonus, left, number_project, promotion_last_5years, salary, satisfaction_level, time_spend_company, Work_accident)

# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

# Import pycaret, pipeline and model

from pycaret.regression import *

import matplotlib.pyplot as plt

import pandas as pd

# Setup Pycaret

reg = setup(data=dataset, target='left')

model = load_model(r'C:\Users\hp\Desktop\Sprint PBI\rf_ins')

# Create prediction with dataset

df1 = predict_model(model, data = dataset)

df = df1[['left']]

df.rename(columns = {'left' : 'Could the employee potentially leave the company'}, inplace = True)

# Create table visual with matplotlib to display results

fig = plt.figure()

table = plt.table(cellText = df.values, loc='center', colLabels = df.columns, cellLoc = 'center')

table.auto_set_font_size(False)

table.set_fontsize(24)

table.scale(5,5)

plt.show()

This is what it displays now

even tho I got this code from a LinkedIn article and I somehow wanted to only display , based on the selected slicers (my features) if the value displayed is 0 means that he'll stay in the company, if it's 1 he has a chance of leaving the company. Problem is I feel like it's correctly working , I just need to display the text.

Note that I also executed this python script below in Power Query in my dataset that generated two extra columns "Prediction_Label" & "Prediction_Score" .

from pycaret.classification import *
rf = load_model('C:/Users/HP/Desktop/Sprint PBI/rf_ins')
dataset = predict_model(rf, data = dataset)

Prediction Score is the probability of the outcome happening.

PabloJMorenoAE · ‎07-20-2023

Hello,

I'm the author of the LinkedIn article that you are refering to in your question, and you are making several mistakes:

1. you don't need the setup, as you are importing the model from a pkl file (assuming that you saved it earlier)

2. make sure that all your columns at the dataframe are correctly named, exactly the same as the model was trained with the original dataframe. Also the values are wtihin the same range of the training dataset used to generate the model.

3. you are using the wrong column name to mask / rename the values. When you apply the `predict_model()`, the classification experiment will automatically produce 2 columns: 'prediction_label' (with the 0 and 1 values) and 'prediction_score' (with the probability of the class predicted). So, your mitake should be solved with:

df = df1[['prediction_label']]

df.rename(columns = {'prediction_label' : 'Could the employee potentially leave the company'}, inplace = True)

Thank you

Pablo

View solution in original post

PabloJMorenoAE · ‎07-20-2023

Hello,

I'm the author of the LinkedIn article that you are refering to in your question, and you are making several mistakes:

1. you don't need the setup, as you are importing the model from a pkl file (assuming that you saved it earlier)

2. make sure that all your columns at the dataframe are correctly named, exactly the same as the model was trained with the original dataframe. Also the values are wtihin the same range of the training dataset used to generate the model.

3. you are using the wrong column name to mask / rename the values. When you apply the `predict_model()`, the classification experiment will automatically produce 2 columns: 'prediction_label' (with the 0 and 1 values) and 'prediction_score' (with the probability of the class predicted). So, your mitake should be solved with:

df = df1[['prediction_label']]

df.rename(columns = {'prediction_label' : 'Could the employee potentially leave the company'}, inplace = True)

Thank you

Pablo

gregoliveira · ‎07-16-2023

Interesting topic. Hope someone can help you. Like to follow the discussion. Have you try to ask ChatGPT if it can help you?

userpowerbiii12 · ‎07-17-2023

Hi gregoliveira, yes! But he seemed to not provide me with the exact steps but I'll definitely try to reformulate my question and see if I could find something

Using Pycaret for Random Forest Classifcation (Employee churn)

Helpful resources

Fabric certifications survey

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly