LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual)

Name: LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual)
Creator: Eka Rahayu Setyaningsih
Published: 2024-02-05T06:28:00.775Z
Keywords: Audio Recording, Audio Synthesis, Video Recording

Setyaningsih, Eka Rahayu; Chen, Christian; Kristian, Yosi; Handayani, Anik Nur; Irianto, Wahyu Sakti

doi:10.17632/8fw93k4rny.4

LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual)

Published: 5 February 2024| Version 4 | DOI: 10.17632/8fw93k4rny.4

Contributors:

Eka Rahayu Setyaningsih, Christian Chen, Yosi Kristian, Anik Nur Handayani, Wahyu Sakti Irianto

Description

LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual) is a carefully curated constrained dataset designed to support research in the field of speech perception. Spoken exclusively in Indonesian, LUMINA contains high-quality audio-visual recordings featuring 14 native speakers, including 9 males and 5 females. Each speaker contributes approximately 1,000 sentences, resulting in a rich and diverse collection of data. The recorded videos focus on facial recordings, capturing essential visual cues and expressions that accompany speech. This extensive dataset provides a valuable resource for understanding how humans perceive and process spoken language, paving the way for advancements in speech recognition and synthesis technologies. This dataset aligns with the classification known within relevant research as a 'Constrained Audio-Visual Dataset,' which finds significant application in lip reading and speech synthesis . The dataset is stored in two separate folders according to sources, male and female. Inside each folder are audio files (.wav), after undergoing resampling and trimming to achieve a consistent sampling rate of 16000 Hz, and video files (.mp4), which have been compressed using the CRF28 standard and has been cropped to a width of 250 pixels and a height of 150 pixels with the cut point at the center of the mouth. Each file audio and video stored in P<speaker’ number>_S<sentence’ number> naming format for each audio and video file. Also included is an Excel (.xlsx) file containing a list of word combinations out of 2500 used during the Lumina dataset compilation.

Files

Institutions

Institut Sains Terapan dan Teknologi Surabaya
Universitas Negeri Malang

LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual)

Description

Files

Institutions

Categories

Licence