Ar-DAD: Arabic Diversified Audio Dataset

Published: 10 April 2020| Version 3 | DOI: 10.17632/3kndp5vs6b.3
Contributors:
,

Description

This is the only available audio library covering this large number of reciters and verses in one harmonized structure that can be used by concerned researchers in different directions. The audio files are organized into two main datasets. Reciters are put into 37 folders created per chapter (78- 114), within each chapter, subfolders are created as per the verse number, within each verse folder, the audio clips are enumerated into 30 different reciters. The second subset includes only one folder of audio clips for imitators categorized by an anonymous ID. The data is shared as WAV format for the audio clips with maximum quality as recorded and disseminated over the internet, no enhancement of any kind is applied after scraping. In this version of the dataset (V3), a third data folder is added for the textual materials of all verses as plain text files; with and without vocalization\vowelization (تشكيل __ Tashkeel).

Files

Steps to reproduce

Download all segmented archives in one folder. Extract by running the file Audio_Dataset.zip All files have been verified and tested/downloaded to extract properly. If any error is faced while extracting the dataset, you may download the reported corrupted part individually it should work fine. The dataset contains 16,209 Files, 568 Folders. Size on disk after successful extraction 10.6 GB (11,472,822,272 bytes)

Institutions

University of Sharjah

Categories

Machine Learning Algorithm, Speaker Characterization, Speaker Recognition, Arabic Language, Audio Analysis, Audio Recognition, Speaker Verification, Convolutional Neural Network, Deep Learning

Licence