KU-BdSL: Khulna University Bengali Sign Language dataset

Published: 28 July 2023| Version 4 | DOI: 10.17632/scpvm2nbkm.4


The KU-BdSL refers to a Bengali sign language dataset, which includes three variants of the data. The variants are - (i) Uni-scale Sign Language Dataset (USLD), (ii) Multi-scale Sign Language Dataset (MSLD), and (iii) Annotated Multi-scale Sign Language Dataset (AMSLD). The dataset consists of images representing single-hand gestures for BdSL alphabets. Several smartphones are taken into account to capture images from 39 participants (30 males and 9 females). These 39 participants associated with the dataset creation have not offered any financial benefit. Each version includes 30 classes that resemble the 38 consonants ('shoroborno') of Bengali alphabets. There is a total of 1,500 images in jpg format in each variant. The images are captured on flat surfaces at different times of the day to vary the brightness and contrast. Class names are Unicode values corresponding to the Bengali alphabets for USLD and MSLD. Folder Names: 2433 -> ‘Chandra Bindu’ 2434 -> ‘Anusshar’ 2435 -> ‘Bisharga’ 2453 -> ‘Ka’ 2454 -> ‘Kha’ 2455 -> ‘Ga’ 2456 -> ‘Gha’ 2457 -> ‘Uo’ 2458 -> ‘Ca’ 2459 -> ‘Cha’ 2460-2479 -> ‘Borgio Ja/Anta Ja’ 2461 -> ‘Jha’ 2462 -> ‘Yo’ 2463 -> ‘Ta’ 2464 -> ‘Tha’ 2465 -> ‘Da’ 2466 -> ‘Dha’ 2467-2472 -> ‘Murdha Na/Donto Na’ 2468-2510 -> ‘ta/Khanda ta’ 2469 -> ‘tha’ 2470 -> ‘da’ 2471 -> ‘dha’ 2474 -> ‘pa’ 2475 -> ‘fa’ 2476-2477 -> ‘Ba/Bha’ 2478 -> ‘Ma’ 2480-2524-2525 -> ‘Ba-y Ra/Da-y Ra/Dha-y Ra’ 2482 -> ‘La’ 2486-2488-2487 -> ‘Talobbo sha/Danta sa/Murdha Sha’ 2489 -> ‘Ha’ USLD: USLD has a unique size for all the images that is 512*512 pixels. The intended hand position is placed in the middle of the majority of cases in this dataset. MSLD: The raw images are stored in MSLD so that researchers can make changes to the dataset. The use of various smartphones yields us a wide variety of image sizes. AMSLD: AMSLD has multi-scale annotated data, which is suitable for tasks like localization and classification. From many annotation formats, the YOLO DarkNet annotation has been selected. Each image has an annotation text file containing five numbers separated by white space. The initial number is an integer, and the rest are floating numbers. The first number of the file indicates the class ID corresponding to the label of that image. Class IDs are mapped in a separate text file named 'obj.names'. The second and third values are the beginning normalized coordinates, while the fourth and fifth define the bounding box's normalized width and height. This dataset is supported by Research and Innovation Center, Khulna University, Khulna-9208, Bangladesh and all the data from this dataset is free to download, modify, and use. The previous version (Version 1) of this dataset contains the oral permission of the volunteers, and the rest versions have written consent of the participants. Therefore, we encourage researchers to use these versions (Version 2 or Version 3 or Version 4) for research objective.


Steps to reproduce

This Bengali sign language (BdSL) dataset can be regenerated by imitating the hand gestures as presented in the dataset and capturing their image.


Khulna University


Sign Language


Khulna University Research and Innovation Center (KURIC), Khulna University, Khulna - 9208, Bangladesh.