Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
userpowerbiii12
Frequent Visitor

Using Pycaret for Random Forest Classifcation (Employee churn)

I'm trying to integrate my machine learning model (Random Forest) I did in Jupyter Notebook inside power BI. Using gridsearch , and after testing multiple algorithms, it turned out that the Random Forest was the most performant one for my dataset.
So I saved my model as a PKL file (rf_ins.PKL) and then I imported the model inside power BI using the Python visual. I used the following code:
 
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

# dataset = pandas.DataFrame(average_montly_hours, bonus, left, number_project, promotion_last_5years, salary, satisfaction_level, time_spend_company, Work_accident)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:
# Import pycaret, pipeline and model
from pycaret.regression import *
import matplotlib.pyplot as plt
import pandas as pd

# Setup Pycaret
reg = setup(data=dataset, target='left')

model = load_model(r'C:\Users\hp\Desktop\Sprint PBI\rf_ins')
# Create prediction with dataset
df1 = predict_model(model, data = dataset)
df = df1[['left']]
df.rename(columns = {'left' : 'Could the employee potentially leave the company'}, inplace = True)
# Create table visual with matplotlib to display results
fig = plt.figure()
table = plt.table(cellText = df.values, loc='center', colLabels = df.columns, cellLoc = 'center')
table.auto_set_font_size(False)
table.set_fontsize(24)
table.scale(5,5)
plt.show()
 
This is what it displays now
userpowerbiii12_0-1689509325639.png

 

even tho I got this code from a LinkedIn article and I somehow wanted to only display , based on the selected slicers (my features) if the value displayed is 0 means that he'll stay in the company, if it's 1 he has a chance of leaving the company. Problem is I feel like it's correctly working , I just need to display the text.
 
Note that I also executed this python script below in Power Query in my dataset that generated two extra columns "Prediction_Label" & "Prediction_Score" .
 
from pycaret.classification import *
rf = load_model('C:/Users/HP/Desktop/Sprint PBI/rf_ins')
dataset = predict_model(rf, data = dataset)
 
Prediction Score is the probability of the outcome happening.
 
userpowerbiii12_1-1689509999330.png

 

1 ACCEPTED SOLUTION
PabloJMorenoAE
New Member

Hello,

I'm the author of the LinkedIn article that you are refering to in your question, and you are making several mistakes:

1. you don't need the setup, as you are importing the model from a pkl file (assuming that you saved it earlier)

2. make sure that all your columns at the dataframe are correctly named, exactly the same as the model was trained with the original dataframe. Also the values are wtihin the same range of the training dataset used to generate the model.

3. you are using the wrong column name to mask / rename the values. When you apply the `predict_model()`, the classification experiment will automatically produce 2 columns: 'prediction_label' (with the 0 and 1 values) and 'prediction_score' (with the probability of the class predicted). So, your mitake should be solved with:

df = df1[['prediction_label']]
df.rename(columns = {'prediction_label' : 'Could the employee potentially leave the company'}, inplace = True)
 
Thank you
Pablo

View solution in original post

3 REPLIES 3
PabloJMorenoAE
New Member

Hello,

I'm the author of the LinkedIn article that you are refering to in your question, and you are making several mistakes:

1. you don't need the setup, as you are importing the model from a pkl file (assuming that you saved it earlier)

2. make sure that all your columns at the dataframe are correctly named, exactly the same as the model was trained with the original dataframe. Also the values are wtihin the same range of the training dataset used to generate the model.

3. you are using the wrong column name to mask / rename the values. When you apply the `predict_model()`, the classification experiment will automatically produce 2 columns: 'prediction_label' (with the 0 and 1 values) and 'prediction_score' (with the probability of the class predicted). So, your mitake should be solved with:

df = df1[['prediction_label']]
df.rename(columns = {'prediction_label' : 'Could the employee potentially leave the company'}, inplace = True)
 
Thank you
Pablo
gregoliveira
Helper II
Helper II

Interesting topic. Hope someone can help you. Like to follow the discussion. Have you try to ask ChatGPT if it can help you?

Hi gregoliveira, yes! But he seemed to not provide me with the exact steps but I'll definitely try to reformulate my question and see if I could find something

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.