Ethernet Frame Physical-Layer Signal Dataset – 10BaseT

Published: 24 November 2025| Version 1 | DOI: 10.17632/x8x39r6nmt.1
Contributors:
munip geylani, Musa ÇIBUK, Ayhan AKBAL

Description

This dataset was developed to support research on network traffic classification using raw electrical signals captured at the physical layer. The core hypothesis behind this work is that different types of network traffic exhibit distinguishable patterns in their physical-layer waveforms due to variations in frame structure. The dataset contains Ethernet frame signals corresponding to six widely used protocol types: DHCP, DNS, HTTP, ICMP, RTSP, and TLS. All data were collected in accordance with the 10Base-T Ethernet standard and are intended for research on signal-level network traffic classification. To construct the raw signal dataset, protocol-specific packet capture (PCAP) files were collected from various sources. These files were then retransmitted over a 10Base-T Ethernet link, and the corresponding electrical signals were captured from the Ethernet cable using an oscilloscope. In total, 7421 unique signal samples (representing individual Ethernet frames) were extracted. In addition to the raw signal files (provided in .csv format), the dataset also includes the original PCAP files used during acquisition. Image Datasets (Visualization-Based): In addition to the raw signal-level dataset, four separate image datasets were generated using different visualization techniques: *vertical *horizontal_zigzag *spectrogram *scalogram Each visualization dataset contains 7,421 images, corresponding one-to-one with the raw signal files. These images were generated to support deep-learning-based classification experiments and are publicly shared together with the signal dataset. MATLAB Script Packages: Two MATLAB script packages are also provided to ensure full reproducibility: 1) Signal_Dataset_Generation_MATLAB_Scripts Contains the scripts used for PCAP preprocessing, deduplication, packet-to-signal matching, automatic signal segmentation, and signal file labeling. 2) Image_Dataset_Generation_MATLAB_Scripts Contains the scripts used to convert the raw 1-D signals into images using the four visualization techniques listed above. Both script folders include a detailed script_information.txt file describing the purpose and functionality of each script. Intended Use: This dataset can be used for signal-based network traffic classification, physical-layer analysis, and deep learning research. It offers a novel perspective on traffic analysis beyond conventional packet- or flow-level features.

Files

Steps to reproduce

Packets were captured by selecting various network interfaces through the web-based management interface of the Bitlis Eren University (BEU) firewall. All protocol-specific traffic was saved in .pcap format. The collected .pcap files were processed in MATLAB for deduplication and segmented based on the storage capacity of the oscilloscope. These segments were then replayed over a 10Base-T Ethernet link using Bit-Twist. A differential probe connected to a Tektronix MDO4104-6 oscilloscope with a DPO4ENET module was used to capture the raw electrical signals during transmission. Ethernet frames were decoded via the DPO4ENET module, producing corresponding *ETH.csv event files. Packets in these files were then matched with their counterparts in the original .pcap files using MATLAB. For each matched packet, the relevant signal segment was extracted (cropped) from the continuous oscilloscope recording. As a result, a unique signal file was generated for each packet. The final dataset, consists of 7421 labeled signal files, each corresponding to an individual Ethernet frame belonging to one of six commonly used protocols. Alongside the dataset, the oscilloscope event tables (*.ETH.csv files), and segmented PCAP files (*part.pcap) have been provided. To facilitate reproducibility, the MATLAB scripts used to generate both the signal-level dataset and the four visualization-based image datasets are also included. Running these scripts allows users to regenerate the signal files and create the corresponding image representations (vertical, horizontal_zigzag, spectrogram, and scalogram).

Categories

Computer Network, Computer Communications, Signal Processing, Network Protocol, Networking

Licence