A Smartphone Camera Based RGB Video Dataset of Natural Hand Gestures

Published: 13 August 2025| Version 2 | DOI: 10.17632/n66hhk695h.2
Contributors:
Tanzeem Rahat, Shahnaj Parvin, Kamruddin Nur

Description

This dataset contains 1,352 high-quality RGB video recordings of 13 everyday hand gestures, captured entirely with a Samsung Galaxy S23 smartphone in a variety of real-world lighting conditions. It has been created to support gesture recognition research on consumer-grade devices, without relying on depth or infrared sensors, in order to emphasize practical and cost-effective solutions. The recordings feature 26 participants, with all participants performing gestures in a full-body view and two participants additionally performing gestures in a hand-only view. For each gesture, there are 92 full-body videos and 12 hand-only videos, resulting in 104 videos per gesture. The dataset captures natural variation in appearance and environment, with gestures performed under a wide range of lighting conditions, including outdoor daylight, dim indoor light, green and red LED lights, backlit scenes, natural white light, and warm light. Each video has been standardized to a resolution of 640×640 pixels at 30 frames per second, encoded in H.264 format, and contains no audio. Unlike some gesture datasets, this collection does not include extracted frames. All data is provided as complete video clips to allow maximum flexibility in preprocessing and model design. The directory structure is organized by gesture class and capture mode, with a dedicated folder for each modality (“full_body” or “hand_only”). A comprehensive metadata file, metadata.csv, is provided at the root of the dataset. This file contains, for each video, the filename, gesture label, participant identifier, take number, capture mode, relative video path, duration in seconds, frame rate, and video resolution. This enables straightforward filtering, indexing, and integration into machine learning workflows. With its combination of diverse participants, varied lighting, and multiple capture perspectives, this dataset provides a realistic and challenging benchmark for developing and evaluating gesture recognition systems. It is particularly well-suited for research that aims to achieve robustness against lighting changes, performer variability, and environmental diversity, as well as for projects exploring mobile-friendly, real-world computer vision solutions.

Files

Steps to reproduce

1. Download the Root.zip archive from the dataset repository. 2. Unzip Root.zip to your preferred location. After extraction, you will have a single root directory containing: - One folder for each gesture class (e.g., come_here/, fist/, etc.). - The metadata.csv file at the root. 3. Inside each gesture class folder, there are two subfolders: - hand_only/ (present only for gestures where hand-only recordings exist) -full_body/ -Each of these contains a videos/ subfolder with .mp4 clips. 4. The metadata.csv file provides complete metadata for each video. 5. No additional processing is required. Users can directly read metadata.csv to programmatically locate and filter videos based on gesture type, participant, capture mode, or other properties. 6. No special software is needed beyond a standard unzipping tool to access the dataset. For programmatic interaction, Python users may find libraries such as pandas and os useful for reading and working with the metadata and file paths.

Institutions

  • American International University Bangladesh

Categories

Computer Vision, Human-Computer Interaction, Gesture Recognition

Licence