KTTS Single Speaker Dataset

Published: 29 August 2024| Version 2 | DOI: 10.17632/5c4dcvxdmb.2
Contributors:
,
,
,

Description

This dataset has been primarily developed to facilitate the creation of text-to-speech systems for the Kashmiri language, a digitally underrepresented language predominantly spoken in the Jammu and Kashmir region of India. The dataset comprises 2,984 audio recordings in WAV format, each accompanied by its corresponding textual data in a separate file named ‘textcorpus.csv’. The ‘id’ column in the CSV file serves as a unique identifier, allowing users to efficiently locate the corresponding WAV files, which are systematically named according to the ‘id’ associated with the sentences they contain. All recordings feature a single male voice with a sample rate of 48,000 Hz, ensuring high-quality audio suitable for detailed phonetic analysis and machine learning applications. This consistent audio quality across the dataset provides a reliable foundation for training and testing text-to-speech models. Furthermore, the dataset can be a valuable resource for future research and development efforts aimed at enhancing digital accessibility for the Kashmiri-speaking population.

Files

Steps to reproduce

The textual data was sourced from various publicly accessible and individual contributors, including scholars and students. The text underwent filtration using an enrichment algorithm to ensure quality and relevance. A young male voice, approximately 25 years of age, was selected to record the speech. A web application was developed to streamline the process of recording, reviewing, saving, deleting, and flagging inaccurate text entries. The dataset comprises two folders: 'Recordings' and 'Text Files'. 'Recordings' folder has 2984 audio recordings each saved in separate WAV file . 'Text Files' folder contain a single 'textcorpus.csv' file. The CSV file has two columns: an 'id' column that links each entry in the 'sentence' column to its corresponding WAV file in the 'Recordings' folder. The WAV files are named according to the 'id' of the sentence in the 'textcorpus.csv' file, ensuring a systematic and consistent file organization.

Institutions

University of Kashmir

Categories

Artificial Intelligence, Natural Language Processing, Speech Recognition, Text-to-Speech

Licence