Dataset for Drone Problem Identification and Severity Estimation

Published: 27 October 2025| Version 1 | DOI: 10.17632/53znhg8p9n.1
Contributors:
Swardiantara Silalahi,
,

Description

This dataset contains a collection of drone flight log messages that was acquired from publicly accessible sources on Mendeley Data [1] and AirData [2]. This dataset consists of two subtasks: binary problem identification and multiclass problem severity classification. The former task used only the collection of log messages from Mendeley Data [1], and the later task used the merged collection of log messages from both sources. Each subtask has train and test split with 80:20 ratio generated with stratified sampling. This dataset covers a diverse drone models across various industrial sectors. Since the log messages are human-readable, this dataset can be used to develop NLP-based solutions to find problem-indicating log records and assist in forensic investigation.

Files

Steps to reproduce

The collection of messages used in this dataset was acquired from two publicly accessible sources: Mendeley Data [1] and AirData [2]. Initially, there were several pre-labeled samples on AirData with four labels: Normal, Low, Medium, and High, to indicate the severity of the problems. We further infer an annotation procedure to annotate the rest of the unlabeled samples. Following the constructed annotation procedure, all the samples were labeled and split into train and test. Finally, a collection of labeled log messages can be used to develop NLP-based solutions to identify problem-indicating logs and estimating the severity of problems occurred during a flight.

Institutions

  • Institut Teknologi Sepuluh Nopember

Categories

Information Extraction, Computer Forensics, Log Analysis, Drone (Aircraft), Sentiment Analysis

Licence