DroNER: Dataset for Drone Named Entity Recognition
The dataset is constructed using several drone images acquired from VTO Labs Drone Forensic Dataset . The dataset's main objective is to attempt performing NER on the human-readable messages contained in the drone flight log files. Six entity types, i.e., component, action, issue, parameter, state, and function, are identified as the region of interest in the domain problem, which is then used to label the entities mentioned in a log message. The entity type identification is performed in the context of drone forensics, as the original intention of constructing this dataset is to build an information extraction model to help the forensic investigator pinpoint an incident-related log record. The NER dataset is annotated using consistent and contextual tagging to compare the effect of contextual tagging on the NER model's performance. Contextual tagging considers surrounding words and uses the longest span as the context to determine which entity type of a particular word belongs to. Contrarily, consistent tagging uses the shortest span as the context of a word within a sentence. The train and test set are split based on the drone models resulting in a proportion of 76:24 since the number of messages extracted from each drone image is uncontrollable.
Steps to reproduce
The data is constructed from several drone images acquired from VTO Labs Drone Forensics Dataset. After collecting the flight logs and parsing the human-readable messages within every flight log file, six entity types are identified after carefully reading all the unique messages. Two annotation procedures, namely consistent and contextual tagging are defined and used to annotate the data. Finally, two datasets are ready to use to build a NER model to recognize entities mentioned in the drone flight log files.
PMDSU Scholarship from The Ministry of Education, Culture, Research and Technology, The Republic of Indonesia