Help Desk Tickets

Published: 30 May 2025| Version 2 | DOI: 10.17632/btm76zndnt.2
Contributor:
Mohammad Abdellatif

Description

These datasets were created as part of a study involving an experiment with a helpdesk team at an international software company. The goal was to implement an automated performance appraisal model that evaluates the team based on issue reports and key features derived from classifying message exchanged with the customers using Dialog Acts. The data was extracted from a PostgreSQL database and curated to present aggregated views of helpdesk tickets reported between January 2016 and March 2023. Certain fields have been anonymized (masked) to protect the data owner’s privacy while preserving the overall meaning of the information. The datasets are: - issues.csv Dataset holds information for all reported tickets, showing its category, priority, who reported the issue, related project, who was assigned to resolve that ticket, the start time, the resolution time, and how many seconds the ticket spent in each resolution step. - issues_change_history.csv Shows when the ticket assignee and status were changed. This dataset helps calculate the time spent on each step. - issues_snapshots.csv Contains the same records in the issues.csv but duplicates the tickets that multiple assignees handled; each record is the processing cycle per assignee. - scored_issues_snapshot_sample.xlsx A stratified and representative sample extracted from the tickets and then handed to an annotator (the help-desk manager) to appraise the resolution performance against three targets, where 5 is the highest score and 1 is the lowest. - sample_utterances.csv Contains the messages (comments) that were exchanged between the reporters and the helpdesk team. This dataset only contains the curated messages for the issues listed in scored_issues_snapshot_sample.xlsx, as those were the focus of the initial study. The following files are guidelines on how to work and interpret the datasets: - FEATURES.md Describes the datasets features (fields). - EXAMPLE.md Shows an example of an issue in all datasets so the reader can understand the relations between them. - process-flow.png A demonstration of the steps followed by the helpdesk team to resolve an issue. These datasets are valuable for many other experiments such like: - Count Predictions - Regression - Association rules mining - Natural Language Processing - Classification - Clustering

Files

Steps to reproduce

The helpdesk system is a web-based system built on a PostgreSQL database. It has many tables, but only the tables shown in db.png were used when extracting data. The original database can't be shared to maintain the privacy of the data provider, but the masked datasets should reflect the actual data that could be useful for any study. It is worth noting that the data related to issue processing and assignment could be reengineered, as it is already available in the provided history dataset. Any researcher can refer to the linked GitHub repository to see the code that was used in the related research and how those datasets were engineered.

Institutions

Princess Sumaya University for Technology King Hussein School for Computing Sciences

Categories

Data Mining, Data Science, Machine Learning, Performance Appraisal and Evaluation, Performance Appraisal Management

Licence