BISINDO Video Dataset
Description
This video dataset presents a visual representation of Indonesian Sign Language (BISINDO), the primary communication system used by the Deaf community in Indonesia. It contains video clips capturing a variety of BISINDO hand gestures performed by native signers. Each video documents commonly used vocabulary in everyday interactions and is designed to support the development of automatic sign language recognition technologies through visual-based approaches. Data collection was conducted using a smartphone camera in a controlled environment, with consistent lighting and a neutral background to ensure visual quality and uniformity. Native BISINDO signers were asked to perform a predetermined list of sign words, with each gesture repeated several times to capture natural variations in hand shape, direction, and movement speed. Recording was done from a frontal viewpoint to ensure full visibility of hand movements, facial expressions, and accompanying visual cues. The dataset consists of six videos: the “Alphabets Video Dataset,” “Numbers Video Dataset,” “Name-of-the-Day Video Dataset,” “Introductory Video Dataset,” “Family Video Dataset,” and “Short Story Video Dataset.” The Alphabets Video Dataset has a duration of 2 minutes and 27 seconds and includes gestures representing letters A to Z. The Numbers Video Dataset lasts 42 seconds and covers gestures for numbers 1 to 10. The Name-of-the-Day Video Dataset is 40 seconds long and includes gestures for the days of the week from Monday to Sunday. The Introductory Video Dataset has a duration of 1 minute and 55 seconds, featuring daily introductory phrases. The Family Video Dataset is 3 minutes and 48 seconds long and includes gestures representing family-related terms. The Short Story Video Dataset lasts 4 minutes and 9 seconds and presents gestures for short everyday narratives.
Files
Steps to reproduce
The data were collected using a Vivo V25E smartphone (64MP camera), recording participants from a 100 cm distance in a well-lit, neutral background setting. Participants were native BISINDO users performing predefined hand gestures. Videos were segmented and converted into image sequences (JPG format) for analysis. No additional normalization was applied. Inclusion criteria required clear visibility of hands and facial expressions; incomplete or obscured recordings were excluded.
Institutions
- Universitas Kuningan
- Telkom University