Annotated Arabic Sign Language News Dataset

Published: 2 October 2024| Version 1 | DOI: 10.17632/y2d2kz3btb.1
Contributors:
,
, Aisha Hassan Khalfani, Mohamed Nour Hayek

Description

The Annotated Arabic Sign Language News Dataset comprises 14 JSON files, each containing annotations for 14 publicly available news videos sourced from the Al Jazeera Channel on YouTube. The annotations are carried out by a native deaf specialist under the guidance of the Mada Qatar Assistive Technology Center, ensuring high-quality labeling of signs performed by a professional sign language interpreter. Each JSON file includes time-aligned sign labels, facilitating various sign language processing tasks such as recognition, translation, and synthesis. Some signs remain unrecognized, marked with a special label as a disclaimer. This dataset is a valuable resource for researchers and developers focusing on Arabic sign language technologies, contributing to advancing sign language processing in the digital domain. This JSON file contains annotations for a news video sourced from Al Jazeera's YouTube channel. The annotations are done by a native deaf specialist and are structured in a sequential manner, with each sign annotated along with its corresponding start and end times. The annotations include the Arabic sign labels representing the interpreted signs used in the video. For example: - The sign "كلهم" ("all of them") appears between 45.106 and 45.436 seconds. - The sign "اخبار" ("news") appears between 45.436 and 45.867 seconds. - The sign "تميم" ("Tamim") appears between 48.971 and 49.832 seconds. The dataset also includes a special label for unrecognized signs, labeled as "لا اعرف الإشارة" ("I don’t know the sign"). For instance, this label is used between 46.736 and 48.191 seconds, indicating that the sign was not recognized by the interpreter. These annotations are useful for sign language processing tasks such as recognition, translation, and analysis of sign language videos.

Files

Steps to reproduce

[01] Download the Dataset: Access and download the 14 JSON files from the provided dataset link in Mendeley Data. [02] Obtain the News Videos: Retrieve the 14 public news videos from Al Jazeera Channel on YouTube. Ensure the videos match the specific URLs or titles associated with each JSON file. [03] Load the JSON Files: For each video, load the corresponding JSON file containing the annotation data. These JSON files include time-aligned labels indicating the sign language annotations. [04] View the Annotations: The annotations can be viewed using the JUMLA Annotation tool, available on GitHub at JUMLA-Sign-Language-Annotation-Tool. This tool allows you to visualize and interact with the sign language annotations. [05] Use a Sign Language Processing Tool: Alternatively, use any preferred sign language processing tool or custom scripts to parse the JSON data. The JSON files contain the annotated sign labels, with time segments indicating the signs performed by the interpreter in the video. [06] Address Unrecognized Signs: The dataset includes a special label for unrecognized signs. These instances can be processed separately or marked for future manual annotation if needed. [07] Perform Analysis: Once the JSON data is loaded, you can perform tasks such as sign language recognition, translation, or any other relevant processing task. The annotations provide structured data for various linguistic and computational analysis applications.

Categories

Sign Language, Video, Text Processing

Licence