Process Mining - Part 1
Big Processes are a Big Topic in data analytics. This is particularly true in today’s enterprises who are placing a heavy emphasis on operational efficiency and are looking to instill a culture of continuous improvement.
“Process mining” is a set of analytical techniques we can use to gain insights into the following types of questions:
- Process discovery
- What is really happening in the workforce ?
- How can we understand the current process ? Especially where no formal model artefacts may exist.
- How often are activities performed ?
- Who are the fastest at performing certain tasks ?
- What is the “cycle” time ? What is the “waiting” time ?
- How often do “handoffs” occur ? … and many other questions !
Tools like R provide an ideal platform for getting into a journey of process mining.
There is a brilliant R library centered on process mining from the people here:
“Workflow” type applications are ubiquitous in corporate settings and are a common candidate for this type of data supply for analysis. You may be able to find good candidates for sourcing this type of data in your workplace. For demonstration purposes here, I’ve synthesised some data. I’ve created the data to resemble the transaction log of loan originations operations for a financial services organisation.
First up, get the data from an Excel file I created:
library(readxl)
auditextract <- read_excel("G:/OneDrive/Documents/bupaR/AuditExtract1.xlsx")
Then, load up the required bupaR libraries and create an event log object from the source while mapping the relevant columns into the event log construct in bupaR. I won’t go into all the details of the event data model here - it’s relatively simple - an event belongs to a case, and a case is an instance of the process. You can read about the model in more detail the bupaR site.
library(bupaR)
library(processanimateR)
el <- auditextract %>%
mutate(status = "complete",
activity_instance = 1:nrow(.)) %>%
eventlog(
case_id = "AuditCaseID",
activity_id = "StatusDescription",
activity_instance_id = "rowid",
lifecycle_id = "status",
timestamp = "CreatedTimestamp",
resource_id = "ResourceName2"
)
# animate_process(eldata)
# nrow(el)
So, what does the raw process consist of ? With bupaR, we can easily visualise a process by plotting a process map. Let’s have a look:
process_map(el)
Well, wait right there ! The obvious comes to mind straight away - this ain’t no simple map. If we want to break it down and get some insights into what’s really going on in the workflow, let’s do the following.
The process consists of a set of 5 stages, and for each stage a set of associated statuses. For example, for the stage “Start Application”, status moves through “Moved into queue”, “Assigned to James”, “Started application processing” and so on.
The following R code will collapse the sets of statuses for each stage, so that we can get a higher level picture of the workflow.
library(dplyr)
qs <- auditextract %>% distinct(objectTypeDesc, StatusDescription)
el <- el %>%
act_collapse(StartApplication = filter(qs, objectTypeDesc=='Start Application')$StatusDescription) %>%
act_collapse(UploadDocuments = filter(qs, objectTypeDesc=='Upload Documents')$StatusDescription) %>%
act_collapse(Settlement = filter(qs, objectTypeDesc=='Settlement')$StatusDescription) %>%
act_collapse(ProcessDocuments = filter(qs, objectTypeDesc=='Process Documents')$StatusDescription) %>%
act_collapse(Disburse = filter(qs, objectTypeDesc=='Disburse')$StatusDescription) %>%
act_collapse(StartDocuments = filter(qs, objectTypeDesc=='Start Documents')$StatusDescription)
Let’s have a look at the process map now:
process_map(el)
Now we can start to get a picture of what’s going on in there !
The default provides is annotated with frequencies of activity flow.
In Part 2 of Process Mining I’ll show a little more of the capabilities of analysis of processes including how to animate the process map !