Datasets Comparison
Version 2
Hornet 40: Network Dataset of Geographically Placed Honeypots
Description
Hornet 40 is a dataset of 40 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots are located in eight different cities: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore. The data was captured in April, May, and June 2021.
The eight cloud servers were created and configured simultaneously following identical instructions. The network capture was performed using the Argus network monitoring tool in each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated as a honeypot. No honeypot software was used in this dataset.
The dataset consists of eight scenarios, one for each geographically located cloud server. Each scenario contains bidirectional NetFlow files in the following format:
- hornet40-biargus.tar.gz: all scenarios with bidirectional NetFlow files in Argus binary format;
- hornet40-netflow-v5.tar.gz: all scenarios with bidirectional NetFlow v5 files in CSV format;
- hornet40-netflow-extended.tar.gz: all scenarios with bidirectional NetFlows files in CSV format containing all features provided by Argus.
- hornet40-full.tar.gz: download all the data (biargus, NetFlow v5, and extended NetFlows)
Steps to reproduce
This dataset used cloud server instances from Digital Ocean. For this dataset all cloud servers have the same technical configurations: a) Operating System: Ubuntu 20.04LTS, b) Instance Capacity: 1GB / 1 Intel CPU, c) Instance Storage: 25 GB NVMe SSDs, d) Instance Transfer: 1000 GB transfer.
Once the cloud instances were created the servers were configured simultaneously using the parallel-ssh and parallel-scp tools:
i. Update the software repository: apt update
ii. Install Argus: apt install -yq argus-client argus-server
iii. Upload common SSH configuration with SSH on a non-standard port to each server /etc/ssh/sshd_config
iv. Restart SSH servers: /etc/init.d/ssh restart
v. Upload common Argus configuration to each server at /etc/argus.conf
vi. Start Argus server: argus -F /etc/argus.conf -i eth0
vii. Create a folder to store the NetFlow files: mkdir /root/dataset
viii. Start rasplit to store the network data received by Argus: rasplit -S 127.0.0.1:900 -M time 1h -w /root/dataset/%Y/%m/%d/do-sensor.%H.%M.%S.biargus
SSH Configuration:
AcceptEnv LANG LC_*
ChallengeResponseAuthentication no
Include /etc/ssh/sshd_config.d/*.conf
PasswordAuthentication no
PermitRootLogin yes
Port 902
PrintMotd no
Subsystem sftp /usr/lib/openssh/sftp-server
UsePAM yes
X11Forwarding yes
Argus Configuration:
ARGUS_FLOW_TYPE="Bidirectional"
ARGUS_FLOW_KEY="CLASSIC_5_TUPLE"
ARGUS_ACCESS_PORT=900
ARGUS_INTERFACE=eth0
ARGUS_FLOW_STATUS_INTERVAL=3600
ARGUS_MAR_STATUS_INTERVAL=60
ARGUS_GENERATE_RESPONSE_TIME_DATA=yes
ARGUS_GENERATE_PACKET_SIZE=yes
ARGUS_GENERATE_JITTER_DATA=yes
ARGUS_GENERATE_MAC_DATA=yes
ARGUS_GENERATE_APPBYTE_METRIC=yes
ARGUS_GENERATE_TCP_PERF_METRIC=yes
ARGUS_GENERATE_BIDIRECTIONAL_TIMESTAMPS=yes
ARGUS_CAPTURE_DATA_LEN=480
ARGUS_BIND_IP="::1,127.0.0.1"
Ra configuration:
RA_PRINT_LABELS=0
RA_FIELD_DELIMITER=','
RA_USEC_PRECISION=6
RA_PRINT_NAMES=0
RA_TIME_FORMAT="%Y/%m/%d %T.%f"
RA_FIELD_SPECIFIER= srcid seq stime ltime dur sstime sltime sdur dstime
dltime ddur srng drng trans flgs avgdur stddev mindur maxdur saddr dir
daddr proto sport dport sco dco stos dtos sdsb ddsb sttl dttl shops dhops
sipid dipid pkts spkts dpkts bytes sbytes dbytes appbytes sappbytes
dappbytes load sload dload rate srate drate loss sloss dloss ploss sploss
dploss senc denc smac dmac smpls dmpls svlan dvlan svid dvid svpri dvpri
sintpkt dintpkt sintpktact dintpktact sintpktidl dintpktidl sintpktmax
sintpktmin dintpktmax dintpktmin sintpktactmax sintpktactmin
dintpktactmax dintpktactmin sintpktidlmax sintpktidlmin dintpktidlmax
dintpktidlmin jit sjit djit jitact sjitact djitact jitidl sjitidl djitidl
state deldur delstime delltime dspkts ddpkts dsbytes ddbytes pdspkts
pddpkts pdsbytes pddbytes suser:1500 duser:1500 tcpext swin dwin jdelay
ldelay bins binnum stcpb dtcpb tcprtt synack ackdat inode smaxsz sminsz
dmaxsz dminsz
Institutions
Ceske Vysoke Uceni Technicke v Praze
Categories
Applied Sciences, Cybersecurity, Cloud Computing, Incident Response, Droplet, Networking, Cloud Security, Cloud Droplet, Cyber Attack
Licence
Creative Commons Attribution 4.0 International
Version 3
Hornet 40: Network Dataset of Geographically Placed Honeypots
Description
Hornet 40 is a dataset of 40 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots are located in eight different cities: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore. The data was captured in April, May, and June 2021.
The eight cloud servers were created and configured simultaneously following identical instructions. The network capture was performed using the Argus network monitoring tool in each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated as a honeypot. No honeypot software was used in this dataset.
The dataset consists of eight scenarios, one for each geographically located cloud server. Each scenario contains bidirectional NetFlow files in the following format:
- hornet40-biargus.tar.gz: all scenarios with bidirectional NetFlow files in Argus binary format;
- hornet40-netflow-v5.tar.gz: all scenarios with bidirectional NetFlow v5 files in CSV format;
- hornet40-netflow-extended.tar.gz: all scenarios with bidirectional NetFlows files in CSV format containing all features provided by Argus.
- hornet40-full.tar.gz: download all the data (biargus, NetFlow v5, and extended NetFlows)
Steps to reproduce
This dataset used cloud server instances from Digital Ocean. For this dataset all cloud servers have the same technical configurations: a) Operating System: Ubuntu 20.04LTS, b) Instance Capacity: 1GB / 1 Intel CPU, c) Instance Storage: 25 GB NVMe SSDs, d) Instance Transfer: 1000 GB transfer.
Once the cloud instances were created the servers were configured simultaneously using the parallel-ssh and parallel-scp tools:
i. Update the software repository: apt update
ii. Install Argus: apt install -yq argus-client argus-server
iii. Upload common SSH configuration with SSH on a non-standard port to each server /etc/ssh/sshd_config
iv. Restart SSH servers: /etc/init.d/ssh restart
v. Upload common Argus configuration to each server at /etc/argus.conf
vi. Start Argus server: argus -F /etc/argus.conf -i eth0
vii. Create a folder to store the NetFlow files: mkdir /root/dataset
viii. Start rasplit to store the network data received by Argus: rasplit -S 127.0.0.1:900 -M time 1h -w /root/dataset/%Y/%m/%d/do-sensor.%H.%M.%S.biargus
SSH Configuration:
AcceptEnv LANG LC_*
ChallengeResponseAuthentication no
Include /etc/ssh/sshd_config.d/*.conf
PasswordAuthentication no
PermitRootLogin yes
Port 902
PrintMotd no
Subsystem sftp /usr/lib/openssh/sftp-server
UsePAM yes
X11Forwarding yes
Argus Configuration:
ARGUS_FLOW_TYPE="Bidirectional"
ARGUS_FLOW_KEY="CLASSIC_5_TUPLE"
ARGUS_ACCESS_PORT=900
ARGUS_INTERFACE=eth0
ARGUS_FLOW_STATUS_INTERVAL=3600
ARGUS_MAR_STATUS_INTERVAL=60
ARGUS_GENERATE_RESPONSE_TIME_DATA=yes
ARGUS_GENERATE_PACKET_SIZE=yes
ARGUS_GENERATE_JITTER_DATA=yes
ARGUS_GENERATE_MAC_DATA=yes
ARGUS_GENERATE_APPBYTE_METRIC=yes
ARGUS_GENERATE_TCP_PERF_METRIC=yes
ARGUS_GENERATE_BIDIRECTIONAL_TIMESTAMPS=yes
ARGUS_CAPTURE_DATA_LEN=480
ARGUS_BIND_IP="::1,127.0.0.1"
Ra configuration:
RA_PRINT_LABELS=0
RA_FIELD_DELIMITER=','
RA_USEC_PRECISION=6
RA_PRINT_NAMES=0
RA_TIME_FORMAT="%Y/%m/%d %T.%f"
RA_FIELD_SPECIFIER= srcid seq stime ltime dur sstime sltime sdur dstime
dltime ddur srng drng trans flgs avgdur stddev mindur maxdur saddr dir
daddr proto sport dport sco dco stos dtos sdsb ddsb sttl dttl shops dhops
sipid dipid pkts spkts dpkts bytes sbytes dbytes appbytes sappbytes
dappbytes load sload dload rate srate drate loss sloss dloss ploss sploss
dploss senc denc smac dmac smpls dmpls svlan dvlan svid dvid svpri dvpri
sintpkt dintpkt sintpktact dintpktact sintpktidl dintpktidl sintpktmax
sintpktmin dintpktmax dintpktmin sintpktactmax sintpktactmin
dintpktactmax dintpktactmin sintpktidlmax sintpktidlmin dintpktidlmax
dintpktidlmin jit sjit djit jitact sjitact djitact jitidl sjitidl djitidl
state deldur delstime delltime dspkts ddpkts dsbytes ddbytes pdspkts
pddpkts pdsbytes pddbytes suser:1500 duser:1500 tcpext swin dwin jdelay
ldelay bins binnum stcpb dtcpb tcprtt synack ackdat inode smaxsz sminsz
dmaxsz dminsz
Institutions
Ceske Vysoke Uceni Technicke v Praze
Categories
Applied Sciences, Cybersecurity, Cloud Computing, Incident Response, Droplet, Networking, Cloud Security, Cloud Droplet, Cyber Attack
Licence
Creative Commons Attribution 4.0 International