SoraniTTS dataset : Central Kurdish (CK) Speech Corpus for Text-to-Speech
Description
This dataset represents a comprehensive resource for advancing Kurdish TTS systems. Converting text to speech is one of the important topics in the design and construction of multimedia systems, human-machine communication, and information and communication technology, and its purpose, along with speech recognition, is to establish communication between humans and machines in its most basic and natural form, that is, spoken language. For our text corpus, we collected 6,565 sentences from a set of texts in various categories, including news, sport, health, question and exclamation sentences, science, general information, politics, education and literature, story, miscellaneous, and tourism, to create the train sentences. We thoroughly reviewed the texts and normalized them, then they were recorded by a male speaker. We recorded audios in a voice recording studio at 44,100Hz, and all audio files are down sampled to 22,050 Hz in our modeling process. The audio ranges from 3 to 36 seconds in length. We generate the speech corpus in this method, and the last speech has about 6,565 texts and audio pairings, which takes around 19 hours. Altogether, audio files are saved in wave format, and the texts are saved in text files in the corresponding sub-folders. Furthermore, for model training, all of the audio files are gathered in a single folder. Each line in the transcript files is formatted as WAVS | audio file’s name.wav| transcript. The audio file’s name includes the extensions, and the transcript was the speech's text. The audio recording and editing process lasted for 90 days. It involved capturing over 6,565 WAV files and over 19 h of recorded speech. The data set helps researchers improve Kurdish TTS early, thereby reducing the time consumed for this process. Acknowledgments: We would like to express our sincere gratitude to Ayoub Mohammadzadeh for his invaluable support in recording the corpus.