Prof. Dr. Ir. Wil van der Aalst, Department of Mathematics and Computer Science (Information Systems WSK&I) is the founding father of “Process Mining ” and is located at the Data Science Center, Eindhoven in the Netherlands. You will find many quotes attributed to him in this post.


Today a tremendous amount of  information about business processes is recorded by information systems in the form of  “event logs”. Despite the omnipresence of such data, most organisations diagnose problems based on fiction rather than facts. Process mining is an emerging discipline based on process model-driven approaches and data mining. It not only allows organisations to fully benefit from the information stored in their systems, but it can also be used to check the conformance of processes, detect bottlenecks, and predict execution problems.

So lets see what it is all about?

Companies use information systems to enhance the processing of their business transactions. Enterprise resource planning (ERP)  and workflow management systems (WFMS)  are the predominant information system types that are used to support and automate the execution of business processes. Business processes like procurement, operations, logistics, sales and human resources can hardly be imagined without the integration of information systems that support and monitor relevant activities in modern companies. The increasing integration of information systems does not only provide the means to increase effectiveness and efficiency. It also opens up new possibilities of data access and analysis. When information systems are used for supporting and automating the processing of business transactions they generate data. This data can be used for improving business decisions.

The application of techniques and tools for generating information from digital data is called business intelligence (BI) . Prominent BI approaches are online analytical processing (OLAP)  and data mining  (Kemper et al. 2010 pp. 1–5). OLAP tools allow analysing multidimensional data using operators like roll-up and drill-down, slice and dice or split and merge (Kemper et al. 2010 pp. 99–106). Data mining is primarily used for discovering patterns in large data sets (Kemper et al. 2010 p. 113).

However the availability of data is not only a blessing as a new source of information but it can also become a curse. The phenomena of information overflow  (Krcmar 2010 pp. 54–57), data explosion (Van der Aalst 2011 pp. 1–3) and big data  (Chen et al.2012) illustrate several problems that arise from the availability of enormous amounts of data. Humans are only able to handle a certain amount of information in a given time frame. When more and more data is available how can it actually be used in a meaningful manner without overstraining the human recipient?

Data mining  is the analysis of data for finding relationships and patterns. The patterns are an abstraction of the analysed data. Abstraction reduces complexity and makes information available for the recipient. The aim of “Process Mining” is the extraction of information about business processes (Van der Aalst 2011 p. 1). Process mining encompasses “techniques, tools and methods to discover, monitor and improve real processes “by extracting knowledge from event logs” (Van der Aalst et al. 2012 p. 15). The data that is generated during the execution of business processes in information systems is used for reconstructing process models. These models are useful for analysing and optimising processes. Process mining is an innovative approach and builds a bridge between data mining (BI) and business process management (BPM).

Process mining evolved in the context of analysing software engineering processes  by Cook and Wolf in the late 1990s (Cook and Wolf 1998). Agrawal and Gunopulos (Agrawal et al. 1998) and Herbst and Karagiannis (Herbst and Karagiannis 1998) introduced process mining to the context of workflow management. Major contributions to the field have been added during the last decade by van der Aalst and other research colleagues by developing mature mining algorithms and addressing a variety of topic related challenges (Van der Aalst 2011). This has led to a well developed set of methods and tools that are available for scientists and practitioners.

Introduction to the basic concepts of process mining. 

The aim of process mining is the construction of process models based on available event log data. In the context of information system science a model is an immaterial representation of its real world counterpart used for a specific purpose (Becker et al.2012 pp. 1–3). Models can be used to reduce complexity by representing characteristics of interest and by omitting other characteristics. A process model is a graphical representation of a business process that describes the dependencies between activities that need to be executed collectively for realising a specific business objective. It consists of a set of activity models and constraints between them (Weske 2012 p. 7).

Process models can be represented in different process modelling languages, BPMN provides more intuitive semantics that are easier to understand for recipients that do not possess a theoretical background in informatics. So I am going to use BPMN models for examples in this post.

Above is a business process model of a simple procurement process . It starts with the definition of requirements. The goods or service get ordered, at some point of time the ordered goods or service get delivered. After the goods or service have been received the supplier issues an invoice which is finally settled by the company that ordered the goods or service.

Each one of the events depicted in the process above will have an entry in an event log.  An event log  is basically a table. It contains all recorded events that relate to executed business activities. Each event is mapped to a case. A process model  is an abstraction of the real world execution of a business process. A single execution of a business process is called process instance . They are reflected in the event log as a set of events that are mapped to the same case. The sequence of recorded events in a case is called trace . The model that describes the execution of a single process instance is called process instance model . A process model abstracts from the single behaviour of process instances and provides a model that reflects the behaviour of all instances that belong to the same process. Cases and events are characterised by classifiers and attributes.Classifiers  ensure the distinctness of cases and events by mapping unique names to each case and event. Attributes store additional information that can be used for analysis purposes.

The Mining Process

The process above provides an overview of the different process mining activities. Before being able to apply any process mining technique it is necessary to have access to the data. It needs to be extracted from the relevant information systems. This step is far from trivial. Depending on the type of source system the relevant data can be distributed over different database tables. Data entries might need to be composed in a meaningful manner for the extraction. Another obstacle is the amount of data. Depending on the objective of the process mining up to millions of data entries might need to be extracted which requires efficient extraction methods. A further important aspect is confidentiality. Extracted data might include personalised information and depending on legal requirements anonymisation or pseudonymisation might be necessary.

Before the extracted event log can be used it needs to be filtered and loaded into the process mining software. There are different reasons why filtering is necessary. Information systems are not free of errors . Data may be recorded that does not reflect real activities. Errors can result from malfunctioning programs but also from user disruption or hardware failures that leads to erroneous records in the event log.

Process Mining Algorithms

The main component in process mining is the mining algorithm. It determines how the process models are created. A broad variety of mining algorithms do exist. The following three categories will be discussed but not in great detail.

  • Deterministic mining algorithms
  • Heuristic mining algorithms
  • Genetic mining algorithms

Determinism means that an algorithm only produces defined and reproducible results. It always delivers the same result for the same input. A representative of this category is the α-Algorithm  (Van der Aalst et al. 2002). It was one of the first algorithms that are able to deal with concurrency. It takes an event log as input and calculates the ordering relation of the events contained in the log.

Heuristic mining also uses deterministic algorithms but they incorporate frequencies of events and traces for reconstructing a process model. A common problem in process mining is the fact that real processes are highly complex and their discovery leads to complex models. This complexity can be reduced by disregarding infrequent paths in the models.

Genetic mining algorithms use an evolutionary approach that mimics the process of natural evolution. They are not deterministic. Genetic mining algorithms follow four steps: initialisation, selection, reproduction and termination . The idea behind these algorithms is to generate a random population of process models and to find a satisfactory solution by iteratively selecting individuals and reproducing them by crossover and mutation over different generations. The initial population of process models is generated randomly and might have little in common with the event log. However due to the high number of models in the population, selection and reproduction better fitting models are created in each generation.

The process above shows a mined process model that was reconstructed by applying the α-Algorithm from an event log. It was translated into a BPMN model for better comparability. Obviously this model is not the same as the model in the first process diagram above. The reason for this is that the mined event log includes cases that deviate from the ideal linear process execution that was assumed for modelling in the first process depiction. In case 4 the invoice is received before the goods or service. Due to the fact that both possibilities are included in the event log (goods or service received before the invoice in case 1, 2, 3, 5 and invoice received before the ordered goods in case 4) the mining algorithm assumes that these activities can be carried out concurrently.

Process Discovery and Enhancement

A major area of application for process mining is the discovery of formerly unknown process models for the purpose of analysis or optimisation  (Van der Aalst et al. 2012 p. 13). Business process reengineering and the implementation of ERP systems in organisations gained strong attention starting in the 1990s. Practitioners have since primarily focused on designing and implementing processes and getting them to work. With maturing integration of information systems into the execution of business processes and the evolution of new technical possibilities the focus shifts to analysis and optimisation.

Actual executions of business processes can now be described and be made explicit. The discovered processes can be analysed for performance indicators like average processing time or costs for improving or reengineering the process. The major advantage of process mining is the fact that it uses reliable data. The date that is generated in the source systems is generally hard to manipulate by the average system user. For traditional process modelling necessary information is primarily gathered by interviewing, workshops or similar manual techniques that require the interaction of persons. This leaves room for interpretation and the tendency that ideal models are created based on often overly optimistic assumptions.

Analysis and optimisation is not limited to post-runtime inspections. Instead it can be used for operational support  by detecting traces being executed that do not follow the intended process model. It can also be used for predicting the behaviour of traces under execution. An example for runtime analysis is the prediction of the expected completion time by comparing the instance under execution with similar already processed instances. Another feature can be the provision of recommendations to the user for selecting the next activities in the process. Process mining can also be used to derive information for the design of business processes before they are implemented.


Process mining builds the bridge between data mining (BI)  and business process management (BPM). The increasing integration of information systems for supporting and automating the execution of business transactions provides the basis for novel types of data analysis. The data that is stored in the information systems can be used to mine and reconstruct business process models. These models are the foundation for a variety of application areas including process analysis and optimisation or conformance and compliance checking. The basic constructs for process mining are event logs, process models and mining algorithms. I have summarised essential concepts of process mining in this post, illustrating the main application areas and one of the available tools, namely ProM.

Process mining is still a young research discipline and limitations concerning noise, adequate representation and competing quality criteria should be taken into account when using process mining. Although some areas like the labelling of events, complexity reduction in mined models and phenomena like concept drift need to be addressed by further research the available set of methods and tools provide a rich and innovative resource for effective and efficient business process management.