Dataset for Drone Problem Identification and Severity Estimation
Description
This dataset contains a collection of drone flight log messages that was acquired from publicly accessible sources on Mendeley Data [1] and AirData [2]. This dataset consists of two subtasks: binary problem identification and multiclass problem severity classification. The former task used only the collection of log messages from Mendeley Data [1], and the later task used the merged collection of log messages from both sources. Each subtask has train and test split with 80:20 ratio generated with stratified sampling. This dataset covers a diverse drone models across various industrial sectors. Since the log messages are human-readable, this dataset can be used to develop NLP-based solutions to find problem-indicating log records and assist in forensic investigation.
Files
Steps to reproduce
The collection of messages used in this dataset was acquired from two publicly accessible sources: Mendeley Data [1] and AirData [2]. Initially, there were several pre-labeled samples on AirData with four labels: Normal, Low, Medium, and High, to indicate the severity of the problems. We further infer an annotation procedure to annotate the rest of the unlabeled samples. Following the constructed annotation procedure, all the samples were labeled and split into train and test. Finally, a collection of labeled log messages can be used to develop NLP-based solutions to identify problem-indicating logs and estimating the severity of problems occurred during a flight.
Institutions
- Institut Teknologi Sepuluh Nopember