A Novel Dataset for Multiclass Detection and Classification of Darknet Traffic (SafeSurf Darknet 2025)

Published: 24 July 2025| Version 2 | DOI: 10.17632/kcrnj6z4rm.2
Contributors:
,
,
,
, Qasem Abu Al-Haija

Description

πŸ“¦ Dataset Title SafeSurf Darknet 2025: A Multi-layer Behavioral Dataset for Darknet Traffic Detection and Classification πŸ“˜ Dataset Description SafeSurf Darknet 2025 is a richly labeled dataset that captures network traffic across various anonymizing technologies and VPNs. Unlike traditional datasets labeled by ports or protocols, this dataset organizes traffic by behavioral context, enabling advanced research in: Darknet behavior classification Encrypted traffic analysis Intrusion and anomaly detection systems (IDS/ADS) πŸ§ͺ Labeling and Data Collection Methodology All traffic was manually generated and labeled in a controlled environment. Sessions were classified based on the known context of user activity (e.g., watching YouTube over Tor = Video Streaming). Key aspects: Manual labeling only – no heuristic or automated labeling was used Behavior isolation through dedicated setups and time-aligned collection High-confidence ground-truth annotations πŸ” Privacy-Preserving Technologies Covered Tor Freenet I2P ZeroNet VPN 🎯 Behavioral Classes Captured Browsing Email Chatting Voice over IP (VoIP) File Transfer (FTP) Audio Streaming Video Streaming Peer-to-Peer (P2P) Sharing Normal (non-darknet traffic) Note: Some behaviors (e.g., VOIP) are not captured across all technologies due to service limitations. 🧩 Dataset Structure Layer 1 – Binary Labeling Normal: 360,358 samples Darknet: 91,404 samples Layer 2 – Technology-Specific Classification Freenet: 26,284 samples ZeroNet: 25,499 samples I2P: 22,958 samples Tor: 12,546 samples VPN: 4,117 samples Layer 3 – Behavioral Labeling Browsing: 33,586 FTP: 20,214 Video Streaming: 9,559 P2P Sharing: 9,392 Email: 7,873 Audio Streaming: 5,953 Chatting: 3,489 VOIP: 1,338 πŸ“„ Data Format and Features Provided in CSV format, where each row represents a single network flow, labeled with its associated behavior. Feature columns include: Timestamps Flow duration Packet and byte counts Inter-arrival time metrics TCP/UDP header statistics Directional and flow-based indicators This structure supports a wide range of machine learning and network security research. πŸ’‘ Use Cases Ideal for research and development in: Behavior-based IDS/ADS Real-time encrypted traffic detection Behavioral profiling across anonymizing technologies Multi-class classification under behavioral and technological variance πŸ“š Citation and Licensing Please cite the dataset in your publications and respect the licensing terms included in the dataset repository. πŸ“ Access the dataset here: πŸ”— Mendeley Data – SafeSurf Darknet 2025 πŸ“„ Related Publications: πŸ”— https://www.preprints.org/manuscript/202507.1926/v1 πŸ”— https://ieeexplore.ieee.org/abstract/document/11073091

Files

Steps to reproduce

Please cite the following research paper if you use this dataset: M. J. Obaidat, I. A. Al-Syouf, Y. F. Awawdeh, A. E. Masa'deh, and Q. A. Al-Haija, "Darknet Threats and Detection Strategies: A Concise Overview," 2025 16th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 2025, pp. 1-6, doi: 10.1109/ICICS65354.2025.11073091. Abu Al-Haija, Q.; Obaidat, M. J.; Al-Syouf, I. A.; Awawdeh, Y. F.; Masa’deh, A. E. SafeSurf Darknet 2025: A Novel Dataset for Darknet Traffic Detection and Analysis. Preprints 2025, 2025071926. https://doi.org/10.20944/preprints202507.1926.v1

Institutions

Jordan University of Science and Technology

Categories

Cybersecurity, Internet, Machine Learning, Intrusion Detection, Internetworking, Data Analytics Cybersecurity

Licence