Gunshot Audio Spectrogram Dataset for Binary Classification Using FFT, LogMel, and MFCC Features

Name: Gunshot Audio Spectrogram Dataset for Binary Classification Using FFT, LogMel, and MFCC Features
Creator: Jhon Gonçalves
Published: 2025-08-05T12:49:45.004Z
Keywords: Computer Vision, Computer Forensics

Gonçalves, Jhon; Santos, Adam; Alves, Marcela; Kuribayashi, Hugo; Gomes, Marcos

doi:10.17632/j7m4gb8vmz.1

Gunshot Audio Spectrogram Dataset for Binary Classification Using FFT, LogMel, and MFCC Features

Published: 5 August 2025| Version 1 | DOI: 10.17632/j7m4gb8vmz.1

Contributors:

,

, Hugo Kuribayashi,

Description

This dataset comprises 15,962 labeled audio samples organized into two classes: Gunshot (5,614 instances) and Non-Gunshot (10,348 instances). The data were collected from multiple public repositories, including UrbanSound8k, ESC-50, Gunshot/Gunfire Audio Dataset, Gunshot Audio Dataset, and Gunshot Audio Forensics Dataset. All audio files were pre-processed using the Librosa library: audio durations were standardized to 5 seconds through trimming or zero-padding, and a pre-emphasis filter was applied to enhance high-frequency components. Feature extraction was performed using three established techniques—Fast Fourier Transform (FFT), Mel-Frequency Cepstral Coefficients (MFCC), and Log-Mel Spectrograms—resulting in a spectrogram-based dataset suitable for training and evaluating machine learning models for gunshot detection tasks.

Files

Steps to reproduce

The dataset was constructed by aggregating audio samples from multiple public repositories, including UrbanSound8k, ESC-50, and gunshot-specific datasets such as those by Kabealo et al., Tuncer et al., and Lilien. All audio files were converted to waveform format (WAV), standardized to a fixed length of 5 seconds through trimming or zero-padding, and subjected to a pre-emphasis filter to enhance high-frequency content. Subsequently, three feature extraction techniques—Fast Fourier Transform (FFT), Mel-Frequency Cepstral Coefficients (MFCC), and Log-Mel Spectrograms—were applied using the Librosa library. The final dataset consists of labeled spectrograms categorized into two classes: Gunshot and Non-Gunshot, suitable for training supervised deep learning models for firearm discharge detection.

Institutions

Universidade Federal do Sul e Sudeste do Para

Gunshot Audio Spectrogram Dataset for Binary Classification Using FFT, LogMel, and MFCC Features

Description

Files

Steps to reproduce

Institutions

Categories

Licence