Process Discovery Contest @ BPM [1st Edition]
The Process Discovery approach described in the submitted document is directed towards discovery of process models from a Training Event log representing 10 different real time business process executions, and cross-validating the derived model with a set of two Test Event logs provided for evaluation of the process discovery technique. Each of the Test event logs ((test_log_april_1 to test_log_april_10) and (test_log_may_1 to test_log_may_10)) represents part of the model from the Training Log with complete total of 20 traces for each of the logs, and are characterized by having 10 traces that can be replayed (allowed) and 10 traces that cannot be replayed (disallowed) by the model. The total number of traces for the Test event logs (i.e. April log and May log) is therefore ((10 logs x 20 traces) x 2) = 400 Traces. Our aim is to carry out a classification task to determine the 400 individual traces that makes up the two test event log and then provide a Petri Net representation of the Training model as well as Business Process Model Notation (BPMN) mapping that allows for testing and evaluation of the behaviours/traces recorded in the Test logs. The objective of the proposed approach is to discover and provide process models that matches the original process models in term of balancing between “overfitting” and “underfitting”. A process model is seen as overfitting (the event log) if it is too restrictive, disallowing behaviour which is part of the underlying process. On the other hand, it is underfitting (the reality) if it is not restrictive enough, allowing behaviour which is not part of the underlying process. Following this challenge, we aim to provide a model which is as good in balancing “overfitting” and “underfitting” as it is able to correctly classify the traces that can be replayed in the “test” event log: Thus, • Given a trace (t) representing real process behaviour, the process model (m) classifies it as allowed, or • Given a trace (t) representing a behaviour not related to the process, the process model (m) classifies it as disallowed. The submitted document contains the classification attempts for the events logs provided and discusses the replaying semantics of the process modelling notation that has been employed. In other words, we discuss how, given any process trace t (for the Test event Log) and process model m (for the training log) in the discovered Petri Net and BPMN replaying notation, it can be unambiguously determined whether or not trace t can be replayed on model (m). We also provide a description of the tools used to discover the process models as well as checking the result of the classification task.