Sermon_audio_and_text_dataset (SAT)

Name: Sermon_audio_and_text_dataset (SAT)
Creator: Samah Abbas
Published: 2023-10-03T07:31:34.857Z
Keywords: Transcription, Machine Translation, Speech Recognition, Audio Signal Processing, Audio Recognition

Abbas, Samah

doi:10.17632/fnz5bt24st.1

Sermon_audio_and_text_dataset (SAT)

Published: 3 October 2023| Version 1 | DOI: 10.17632/fnz5bt24st.1

Contributor:

Samah Abbas

Description

Sermon_audio_and_text_dataset (SAT) The Sermon Audio and Text (SAT) dataset addresses a significant gap in the realm of Islamic Friday Sermons (IFS) research. It offers a rich collection tailored for studies in theology, culture, and linguistics, especially pertinent to Arab and Muslim communities. Contents: Volume: Comprises 21,253 synchronized entries of audio and transcription. Coverage: The dataset captures the nuances of Friday sermons, pivotal in understanding the religious, cultural, and linguistic fabric of the Muslim world. Description of the data and file structure The SAT dataset is divided into folders (audio and text). Inside the Audio folder, separate subfolders for each sermon appeared, such as (sermon_1,sermon_2.. and so on). Also, inside each sermon folder, the wav files appeared named (0_sermon_1, 1_sermon_1, 2_sermon_1, and so on). Inside the text folder, for each corresponding audio sermon, the text transcript appeared which was divided into separate subfolders for each sermon, such as (sermon_1,sermon_2.. and so on). Inside each sermon folder, the txt files appeared named (0_sermon_1, 1_sermon_1, 2_sermon_1, and so on). The high-level metadata in the Excel file was provided, to help users read general information about each sermon such as (location, time, date, preacher name, ... so on). The low-level metadata (more details information for each sermon including the length for each chunk audio "sermon wav files", total number of chunks, silence threshold that is used for segmentation... and so on.

Sermon_audio_and_text_dataset (SAT)

Description

Files

Categories

Licence