Making the Case for Process Analytics: A Use Case in Court Proceedings
Description
Data was extracted in PDF format with personal information redacted to ensure privacy. The raw dataset consisted of 260 cases from three chambers within a single German social law court. The data originates from a single judge, who typically oversees five to six chambers, meaning that this dataset represents only a subset of the judge’s total caseload. Optical Character Recognition (OCR) was used to extract the document text, which was organized into an event log according to the tabular structure of the documents. In the dataset, a single timestamp is recorded for each activity, commonly indicating only the date of occurrence rather than a precise timestamp. This limits the granularity of time-based analyses and the accuracy of calculated activity durations. As the analysis focuses on the overall durations of cases, which typically range from multiple months to years, the impact of the timestamp imprecisions was negligible in our use case. After extraction, the event log was further processed in consultation with domain experts to ensure anonymity, remove noise, and raise it to an abstraction level appropriate for analysis. All remaining personal identifiers, such as expert witness names, were removed from the log to ensure anonymity. Additionally, timestamps were systematically perturbed to further enhance data privacy. Originally, the event log contained 22,664 recorded events and 290 unique activities. Activities that were extremely rare (i.e., occurring fewer than 30 times) were excluded to focus on frequently observed procedural steps. Furthermore, the domain experts reviewed the list of unique activity labels, based on which similar activities were merged, and terminology was standardized across cases. The refinement of the activity labels reduced the number of unique activities to 59. Finally, duplicate events were removed. These steps collectively reduced the dataset to 19,947 events. The final anonymized and processed dataset includes 260 cases, 19,947 events from three chambers and 59 unique activities.