A Novel Dataset for Multiclass Detection and Classification of Darknet Traffic (SafeSurf Darknet 2025)

Published: 10 July 2025| Version 1 | DOI: 10.17632/kcrnj6z4rm.1
Contributors:
,
,
,
, Qasem Abu Al-Haija

Description

Dataset Description: Multi-layer Darknet Traffic Behavioral Dataset This dataset provides labeled network traffic captures across multiple anonymizing technologies and VPNs, organized by behavioral context rather than merely by protocol or port. It is designed for research in darknet behavior classification, encrypted traffic analysis, and cybersecurity anomaly detection. Labeling and Data Collection Process All traffic sessions were manually labeled based on their known behavioral context during generation. For example, a capture session involving YouTube over Tor was explicitly labeled as video streaming. No automated or heuristic-based labeling methods were employed; instead, behavior isolation was ensured through controlled setups and traffic timing. This meticulous manual labeling ensures high-confidence ground-truth annotations. The dataset spans five privacy-preserving technologies: Tor Freenet I2P ZeroNet VPN Within these environments, nine behavioral classes were captured: Browsing Email Chatting Voice over IP (VoIP) File Transfer (FTP) Audio Streaming Video Streaming Peer-to-Peer (P2P) Sharing Normal (non-darknet traffic) Dataset Structure The dataset is organized across three hierarchical labeling layers: Layer 1 (Binary Labeling): Normal: 360,358 samples Darknet: 91,404 samples Layer 2 (Technology-Specific Classification): Freenet: 26,284 ZeroNet: 25,499 I2P: 22,958 Tor: 12,546 VPN: 4,117 Layer 3 (Behavioral Labeling): Browsing: 33,586 FTP: 20,214 Video: 9,559 P2P: 9,392 Email: 7,873 Audio: 5,953 Chat: 3,489 VOIP: 1,338 Note: Not all behaviors were captured across all technologies due to availability limitations. For instance, VOIP traffic was not recorded in Freenet or Zeronet environments. Format and Features The dataset is provided in CSV format, where each row corresponds to a single network flow. Each flow is labeled with its corresponding behavioral class. Features include: Timestamp Flow duration Packet count Byte count Inter-arrival time metrics TCP/UDP header statistics Directional flow indicators This rich feature set allows for a variety of research tasks, including but not limited to Encrypted traffic classification Behavioral profiling of darknet activity Multi-class machine learning modeling Intrusion and anomaly detection systems (IDS/ADS) Use Cases This dataset is particularly useful for: Building behavior-based intrusion detection systems Evaluating classifiers under intra-class and inter-technology variations Understanding how the same behavior (e.g., video streaming) manifests differently over various privacy-enhancing technologies Developing real-time detection systems for anonymized or encrypted traffic Citation and Licensing Please cite this dataset appropriately in any research work, and refer to the included license terms for usage and redistribution.

Files

Steps to reproduce

Please cite the following research paper if you use this dataset: Mohammad Obaidat, Ibrahim Al-Syouf, Yahea Awawdeh, Anas Masa'Deh, and Qasem Abu Al-Haija, "Darknet Threats and Detection Strategies: A Concise Overview," The 16th International Conference on Information and Communication Systems, IEEE, 2025.

Institutions

Jordan University of Science and Technology

Categories

Cybersecurity, Internet, Machine Learning, Intrusion Detection, Internetworking, Data Analytics Cybersecurity

Licence