MQTTEEB-D: A Real-World IoT Cybersecurity Dataset for AI-Powered Threat Detection in MQTT Networks

Published: 20 March 2025| Version 1 | DOI: 10.17632/jfttfjn6tr.1
Contributors:
,
,
,
,
,

Description

This dataset accompanies the research article on MQTTEEB-D and is intended for public use in cybersecurity research. The MQTTEEB-D dataset is a practical real-world data set for intrusion detection improvement in Message Queuing Telemetry Transport (MQTT)-based Internet of Things (IoT) networks. In contrast to already existing datasets that are constructed on simulated network traffic, MQTTEEB-D is obtained from a real-time IoT deployment at the International University of Rabat (UIR), Morocco. Using MySignals IoT health sensors, Raspberry Pi 4, and an MQTT broker server, this dataset represents the actual complexity of the active IoT communication process, which synthetic data fails to offer. To narrow the gap between simulated and real-world attack scenarios, various cyberattacks including Denial of Service (DoS), Slow DoS against Internet of Things Environments (SlowITe), Malformed Data Injection, Brute Force, and MQTT publish flooding were carried out in real-time, permitting close monitoring of network traffic anomalies. The data was captured using Python wrapper for tshark (PyShark) and organized into multiple Comma-Separated Values (CSV) files. To ensure high data quality, we performed pre-processing steps, such as outlier removal, normalization, standardization, and class balance. Several processed forms (raw, cleaned, normalized, standardized, Synthetic Minority Over-sampling Technique (SMOTE)) applied for this dataset are provided, along with detailed metadata to facilitate ease of use in cybersecurity research. This dataset provides an opportunity for researchers to develop and validate intrusion detection models in a real-world MQTT environment - a critical ingredient in Artificial Intelligence (AI)-driven cybersecurity solutions for IoT networks. The dataset will support future research IoT security and anomaly detection domains.

Files

Steps to reproduce

For detailed methods and analysis, refer to the related research paper. To use this dataset: 1. Download the dataset files. 2. Use `Raw_RealTime_Data` for raw MQTT traffic from real-time attack scenarios. 3. Use `Preprocessed_Data` for ready-to-use ML/AI models. 4. Refer to the metadata JSON files for encoding details. 5. Tools: PyShark (packet capture), Python (pandas, sklearn), Jupyter Notebook. 6. Attack types and configurations are fully described in the paper associated with this dataset. Related research paper: Under review. Link will be added upon publication.

Institutions

Universite Internationale de Rabat

Categories

Artificial Intelligence, Cybersecurity, Network Security, Denial-of-Service Attack, Intrusion Detection, Internet of Things, Data Analytics Cybersecurity, Cyber Attack

Funding

MG-FARM Project, funded by MESRSI and the European Union under LEAP-RE

Grant Agreement No. 963530

Licence