TempFirm Ground-Truth Dataset v1.0: Labelled Firmware Trigger-Sink Metadata for Time-Triggered Backdoor Detection
Description
TempFirm Ground-Truth Dataset v1.0 is a labelled cybersecurity dataset designed for research on firmware and IoT backdoor detection, with emphasis on temporal trigger mechanisms and security-relevant sink behaviors. The publication-ready CSV contains 10,000 records and 10 fields: id, label, trigger_class, sink_type, tau, eol_date, delta_days, conf, sdk_origin, and notes. Each record is labelled as either backdoor or benign. Backdoor-labelled records describe trigger-sink patterns associated with time, uptime, boot/power-cycle counters, RTC registers, filesystem or certificate-expiry conditions, and watchdog/crash counters. Sink categories include shell execution, network behavior, authentication bypass, firewall manipulation, and credential modification. The dataset also includes confidence scores and optional SDK/source-family identifiers. The dataset is intended to support reproducible academic research in firmware security, IoT security, temporal trigger analysis, explainable cybersecurity feature engineering, and benchmark construction for machine learning or rule-based detection pipelines.
Files
Steps to reproduce
Files included -------------- 1. tempfirm_ground_truth_dataset_v1_clean.csv Main tabular dataset with 10,000 labelled records. 2. data_dictionary.csv Definitions of all columns, data types, and example values. 3. dataset_summary_statistics.csv Summary counts and basic quality checks. 4. README.md Documentation, trigger/sink legends, usage notes, and reproducibility guidance. Steps to reproduce ------------------ 1. Download the dataset package from Mendeley Data. 2. Open tempfirm_ground_truth_dataset_v1_clean.csv using Python, R, Excel, LibreOffice, or another CSV-compatible tool. 3. Use the label column as the ground-truth target for binary classification. 4. Use trigger_class, sink_type, tau, eol_date, delta_days, conf, sdk_origin, and notes as metadata/features depending on the experimental design. 5. Reproduce the basic summary statistics by counting the label, trigger_class, and sink_type columns. Example Python code is provided in README.md.
Institutions
- Universidad de SalamancaCastille and León, Salamanca