Today companies store huge amounts of data related to their various business processes. This data can help discover, monitor and improve your actual business process. The process of extracting process knowledge from data is called Process Mining. Process Mining can help gain better visibility, improve KPIs and eliminate bottlenecks.
One of the popular open source packages to help with process mining is bupaR. It is an open-source, integrated suite of R packages for the handling and analysis of business process data. It was developed by the Business Informatics research group at Hasselt University, Belgium. It currently consists of many packages which can help in calculating descriptives, process monitoring and process visualization.
The bupaR is the core package of the framework. It includes basic functionality for creating event log objects in R. It contains several functions to get information about an event log and provides specific event log versions of generic R functions. Together with the related packages, each of which has its own specific purpose, bupaR aims at supporting each step in the analysis of event data in R, from data import to online process monitoring.
The good news is that now PowerBI service supports bupaR visuals. Let’s explore what we can do! Our attempt here is to just quickly show a few possibilities with bupaR and PowerBI. You can read more about bupaR in some of the links below. For more information on how to create R visuals in the Power BI service, please see Creating R visuals in the Power BI service and Create Power BI visuals using R.
Let’s consider the scenario of patients arriving in an emergency department of a hospital. The event data in this example comes from "patients" dataset from eventdataR package. I made the sample data as a .csv file, then imported the data into PowerBI desktop and next I will show you how to use bupaR to create event logs and plot visuals from PowerBI. The data looks like below picture in PowerBI desktop.
If you are interested to see the process map for the "completed" patients event log, which starts with "Registration" and ends with "Check-out", you can create the R visual in the Power BI Desktop with the following R script:
Once it gets published to Power BI service, we can see it renders as the following image.
If you want to see frequency in the process map, it can be created explicitly using the frequency function. The colors can be modified through the color_scale argument.
library(bupaR)
library(DiagrammeR)
patientsData <- dataset
patientsData$time <- as.POSIXct(patientsData$time, tz = "GMT", format = c("%Y-%m-%d %H:%M:%OS"))
x <- patientsData %>%
eventlog(
activity_id = "handling",
case_id = "patient",
resource_id = "employee",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time"
) %>% process_map(type = frequency("relative", color_scale = "Purples"), render=FALSE)
export_graph(x, "result.png", file_type = "png")
Another example below uses Performance profile focusing on processing time of activities.
library(bupaR)
library(DiagrammeR)
patientsData <- dataset
patientsData$time <- as.POSIXct(patientsData$time, tz = "GMT", format = c("%Y-%m-%d %H:%M:%OS"))
x <- patientsData %>%
eventlog(
activity_id = "handling",
case_id = "patient",
resource_id = "employee",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time"
) %>% process_map(performance(median, "days"), render=FALSE)
export_graph(x, "result.png", file_type = "png")
Different activity sequences in the event log can be visualized with trace_explorer. It can be used to explore frequent as well as infrequent traces. The coverage argument specifies how much of the log you want to explore. Below example shows the most frequent traces covering 98.5% of the event log.
library(bupaR)
patientsData <- dataset
patientsData$time <- as.POSIXct(patientsData$time, tz = "GMT", format = c("%Y-%m-%d %H:%M:%OS"))
patientsData %>%
eventlog(
activity_id = "handling",
case_id = "patient",
resource_id = "employee",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time"
) %>% trace_explorer(type="frequent", coverage = 0.985)
The last example below shows in how many cases each of the activities is present.
library(bupaR)
patientsData <- dataset
patientsData$time <- as.POSIXct(patientsData$time, tz = "GMT", format = c("%Y-%m-%d %H:%M:%OS"))
patientsData %>%
eventlog(
activity_id = "handling",
case_id = "patient",
resource_id = "employee",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time"
) %>% activity_presence %>% plot
Known limitation:
The dataset in PowerBI is a dataframe. To use bupaR, you'll need to convert it to event logs as the given sample R scripts.
References:
1. https://en.wikipedia.org/wiki/Process_mining
2. https://www.bupar.net/index.html
3. https://www.r-bloggers.com/bupar-business-process-analysis-with-r/
Lei Qian | Software Engineer at Microsoft Power BI (Artificial Intelligence) team