Real-Time Recognition and Translation of Kinyarwanda Sign Language into Kinyarwanda Text (2023) Mediapipe NumPy array Hands and Pose extracted key points for 22 Sign Language

Published: 9 July 2024| Version 1 | DOI: 10.17632/p6zc5g9bdy.1
Erick Semindu,


This research addresses the issue of real-time translation of sign language into text (focusing on Kinyarwanda Sign Language) concentrating on twenty-two common gestures in Kinyarwanda sign language. Through extensive exploration and evaluation of various machine learning algorithms, the study identifies the most effective approach for recognizing and translating these gestures. To validate the effectiveness of the developed system, real-world Kinyarwanda sign language video data is utilized for thorough training and testing. The data set contains Hands and Pose Mediapipe extracted key points for the 22 sign language and one additional sign ("---" sign stands for not signing) saved in the NumPy array. It can be used to train the LSTM model for the classification of the 22 signs.


Steps to reproduce

Step One: Data Collection 1. Gathering the Dataset: - Participants: Four skilled Kinyarwanda sign language users (signers). - Number of Signs: 22 signs in Kinyarwanda. - Video Clips: 40 video clips for each of the 22 signs. - Setup: The same laptop and webcam were used for all recordings. - Environment: Videos were shot outside in bright light with a signer against a white background. - Frames: Each video consists of 30 frames capturing key points for each sign. - Labels: The labels for the signs can be found in the metadata file. Step Two: Feature Extraction 1. Extracting Features with MediaPipe: - Process: - For each frame in the videos, hand and pose landmarks are extracted using MediaPipe. - To emphasize the hand's key points, features from each frame's hands are replicated 11 times in the concatenated array of extracted features. - Feature Calculation: - Total features per frame: - Pose landmarks: \( 33 \times (3 + 1) \) - Hand landmarks: \( 21 \times 2 \times 3 \times 11 \) - Formula: \( (33 \times (3 + 1)) + (21 \times 2 \times 3 \times 11) = 1518 \) - Saving Data: The extracted key points are saved as a Numpy array for each video sequence. Summary - Step One: Collect 50 video clips for each of the 22 Kinyarwanda sign labels using four signers in a consistent environment. - Step Two: Extract hand and pose landmarks from each frame using MediaPipe, and save the features as a Numpy array, emphasizing hand key points.


Carnegie Mellon University In Rwanda


Sign Language