Afaan Oromoo Text-to-Speech Dataset
Afaan Oromo is one of the languages that have huge speakers in the horn of Africa. It is also one of the under-resourced languages like other Ethiopian languages. In this Dataset preparation, the soul purpose of the project was to include Afaan Oromo text-to-speech synthesis in our Final year Humanoid robot that can speak the Oromo language in addition to its vision capability to detect emotion, gender, and detect faces of humans. Currently, the natural language processing applications that use this language are in high demand. Furthermore, the linguists and researchers that work on these languages are contributing a lot of data to the growth of this language to make it an international and machine language. This dataset has been started by two Electrical Engineering students at Madda Walabu University, during their 4 months industry internship program at iCog-Labs, while working on machine learning tasks related to Natural language processing. The first phase of the audio was recorded at Addis Ababa University by the two students and the preprocessing was also done by them. The second phase of the audio recording was done at Madda Walabu university after their internship was completed. They have selected female students from the Electrical engineering and Afaan Oromo department to record the audio to get more corpus to train the machine learning model. The model used was Tacotron 1 and 2 with waveGlow and TensorFlow framework that was developed by NVIDIA company. The Corpus statistics Total clips: 1,224 Total Words: 17,559 Total characters: 116,439 Total Duration: 03:11:13 Min clip length: 1 sec max clip length: 59 sec Unique words: 5,040 Credits The volunteers that participated in the audio recording need to be appreciated and they were concerned about their language. They are Obsinet Asmare Motuma, Milko Wariyo Gobana, Roza Hailu Isho, Demitu Baye Boyosa.
Steps to reproduce
The audio has been recorded both at Addis Ababa University, and Madda Walabu University, Ethiopia, and it was captured in an uncontrolled environment with some background noises. To avoid the quality compromise it was recorded during nighttime. Besides, it was preprocessed and denoised with the assistance of audio processing software Audacity and Adobe Audition. The Audio recorder used is a Samsung Galaxy S6 edge, iPhone 4S, and audio recording microphone. Finally, the corpus/dataset could help researchers in the field of Natural language processing for text-to-speech synthesis or speech-to-text synthesis to build a conversation system.