Gunshot Audio Spectrogram Dataset for Binary Classification Using FFT, LogMel, and MFCC Features
Description
This dataset comprises 15,962 labeled audio samples organized into two classes: Gunshot (5,614 instances) and Non-Gunshot (10,348 instances). The data were collected from multiple public repositories, including UrbanSound8k, ESC-50, Gunshot/Gunfire Audio Dataset, Gunshot Audio Dataset, and Gunshot Audio Forensics Dataset. All audio files were pre-processed using the Librosa library: audio durations were standardized to 5 seconds through trimming or zero-padding, and a pre-emphasis filter was applied to enhance high-frequency components. Feature extraction was performed using three established techniques—Fast Fourier Transform (FFT), Mel-Frequency Cepstral Coefficients (MFCC), and Log-Mel Spectrograms—resulting in a spectrogram-based dataset suitable for training and evaluating machine learning models for gunshot detection tasks.
Files
Steps to reproduce
The dataset was constructed by aggregating audio samples from multiple public repositories, including UrbanSound8k, ESC-50, and gunshot-specific datasets such as those by Kabealo et al., Tuncer et al., and Lilien. All audio files were converted to waveform format (WAV), standardized to a fixed length of 5 seconds through trimming or zero-padding, and subjected to a pre-emphasis filter to enhance high-frequency content. Subsequently, three feature extraction techniques—Fast Fourier Transform (FFT), Mel-Frequency Cepstral Coefficients (MFCC), and Log-Mel Spectrograms—were applied using the Librosa library. The final dataset consists of labeled spectrograms categorized into two classes: Gunshot and Non-Gunshot, suitable for training supervised deep learning models for firearm discharge detection.
Institutions
- Universidade Federal do Sul e Sudeste do Para