BISINDO Video Dataset

Published: 3 June 2025| Version 2 | DOI: 10.17632/f33k9w86wr.2
Contributor:
Tito Sugiharto

Description

The BISINDO Video Dataset is a comprehensive collection of video recordings representing Indonesian Sign Language (Bahasa Isyarat Indonesia or BISINDO), the primary sign language used by the Deaf community in Indonesia. This dataset is designed to provide high-quality visual data to support research and development in sign language recognition, natural language processing, and accessibility technologies tailored for BISINDO users. The dataset includes videos of native BISINDO signers performing a wide range of gestures across multiple categories: alphabets (A-Z), numbers (1–10), days of the week, introductory phrases used in daily communication, family-related terms, and short storytelling gestures. Each category is captured in separate video files recorded in a controlled environment with consistent lighting, neutral backgrounds, and a frontal camera angle to ensure clarity and uniformity. Videos were recorded using smartphone cameras to balance accessibility and video quality. Each sign was repeated multiple times to capture natural variations in movement, speed, hand shape, and facial expressions, which are essential for accurate recognition and understanding of the language. The dataset serves as a foundational resource for training machine learning models and developing tools to promote digital inclusivity for the Indonesian Deaf community. Additionally, the videos were converted into image frames to facilitate various types of visual analysis. This dataset offers raw, unfiltered data to enable researchers to apply their own preprocessing and augmentation methods according to their experimental requirements.

Files

Steps to reproduce

The data were collected using a Vivo V25E smartphone (64MP camera), recording participants from a 100 cm distance in a well-lit, neutral background setting. Participants were native BISINDO users performing predefined hand gestures. Videos were segmented and converted into image sequences (JPG format) for analysis. No additional normalization was applied. Inclusion criteria required clear visibility of hands and facial expressions; incomplete or obscured recordings were excluded.

Institutions

  • Universitas Kuningan
  • Telkom University

Categories

Computer Vision, Image Processing, Sign Language

Licence