BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition

Name: BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition
Creator: Nahid Khan
Published: 2025-10-30T16:28:00.960Z
Keywords: Linguistics, Computer Science, Computer Vision, Machine Learning, Pattern Recognition, Human-Computer Interaction

Khan, Nahid; 221-15-5641, Sadman Hossain Roudro; Hossain, Shahadat

doi:10.17632/ms5phkw8sr.1

BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition

Published: 30 October 2025| Version 1 | DOI: 10.17632/ms5phkw8sr.1

Contributors:

,

Description

This dataset provides a collection of images and extracted landmark features for 48 fundamental static signs in Bangla Sign Language (BSL), including 38 alphabets and 10 digits (0-9). It was created to support research in isolated sign language recognition (SLR) for BSL and provide a benchmark resource for the research community. In total, the dataset comprises 14,566 raw images, 14,566 mirrored images, and 29,132 processed feature samples. Data Contents: The dataset is organized into two main folders: 01_Images: Contains 29,132 images in .jpg format (14,566 raw + 14,566 mirrored). • Raw_Images: Contains 14,566 original images collected from participants. • Mirrored_Images: Contains 14,566 horizontally flipped versions of the raw images for data augmentation purposes. • Privacy Note: Facial regions in all images within this folder have been anonymized (blurred) to protect participant privacy, as formal informed consent for sharing identifiable images was not obtained prior to collection. 02_Processed_Features_NPY: Contains 29,132 126-dimensional hand landmark features saved as NumPy arrays in .npy format. Features were extracted using MediaPipe Holistic (capturing 21 landmarks each for the left and right hands, resulting in 63 + 63 = 126 features per image). These feature files are pre-split into train (23,293 samples), val (2,911 samples), and test (2,928 samples) subdirectories (approximately 80%/10%/10%) for standardized model evaluation and benchmarking . Data Collection: Images were collected from 5 volunteers using a Macbook Air M3 camera. Data collection took place indoors under room lighting conditions against a white background. Images were captured manually using a Python script to ensure clarity. Potential Use: Researchers can utilize the anonymized raw and mirrored images (01_Images) to develop or test novel feature extraction techniques or multimodal recognition systems. Alternatively, the pre-processed and split .npy feature files (02_Processed_Features_NPY) can be directly used to efficiently train and evaluate machine learning models for static BSL recognition, facilitating reproducible research and benchmarking. Further Details: Please refer to the README.md file included within the dataset for detailed class mapping (e.g., L1='অ', D0='০'), comprehensive file statistics per class , specifics on the data processing pipeline, and citation guidelines.

Files

Institutions

Daffodil International University

BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition

Description

Files

Institutions

Categories

Licence