Spectral Centroid Images for Multi-class Human Action Analysis : A Benchmark Dataset

Published: 22 May 2023| Version 1 | DOI: 10.17632/yfvv3crnpy.1


This dataset contains a collection of spectral centroid images that represent various human actions. Spectral centroid images are time-frequency representations of audio signals that capture the distribution of frequency components over time. In this dataset, the audio signals correspond to different human actions, such as walking, running, jumping, and dancing. The spectral centroid images were generated using the short-time Fourier transform (STFT) of the audio signals, and each image represents a segment of the audio signal. The dataset is designed for tasks such as human action recognition, classification, segmentation, and detection. It can be used to train and evaluate machine learning models that analyze human actions based on audio signals. The dataset is suitable for researchers and practitioners in the fields of signal processing, computer vision, and machine learning who are interested in developing algorithms for human action analysis using audio signals. The dataset is annotated with labels indicating the type of human action represented in each spectral centroid image.


Steps to reproduce

Install the necessary libraries: You will need to install the NumPy and Librosa libraries, which are used for numerical computing and audio signal processing, respectively. You can install them using pip by running the following command in the terminal: pip install numpy librosa Load the video: Use OpenCV to load the video file into a numpy array. You can do this using the cv2.VideoCapture() method. Extract the audio signal: Extract the audio signal from the video using OpenCV's cv2.audioCapture() method. Convert the audio signal to a numpy array using the numpy.frombuffer() method. Compute the short-time Fourier transform (STFT): Use Librosa's librosa.stft() method to compute the STFT of the audio signal. The STFT represents the frequency content of the audio signal over time. Compute the spectral centroid: Use Librosa's librosa.feature.spectral_centroid() method to compute the spectral centroid of the STFT. The spectral centroid is a measure of the "center of gravity" of the frequency content of the audio signal. Create a sequence of spectral centroid images: Divide the audio signal into short, overlapping segments and compute the spectral centroid for each segment. Each spectral centroid can be represented as an image, where the x-axis represents time and the y-axis represents frequency. Save the spectral centroid images: Save the spectral centroid images as PNG.


Edith Cowan University, University of Western Australia


Computer Vision Representation, Benchmarking, Image Analysis, Action Recognition


Higher Education Commission, Pakistan


Office of National Intelligence

# NIPG-2021–001