Punjabi Speech: A labeled Speech Corpus

Published: 16 August 2023| Version 2 | DOI: 10.17632/sdbc8f5b77.2
Satwinder Singh, Ruili Wang, Feng Hou


The Punjabi Speech corpus is designed for automatic speech recognition and synthesis purposes. The corpus comprises recorded speech samples in the studio and open environment settings, with a sampling rate of 44.1 kHz in WAV file format. The duration of each recording is limited to 15 seconds to prevent memory issues while training on GPUs. The dataset currently contains 2429 spoken utterances from two male speakers, totaling ~4 hours of data. For training, validation, and testing purposes, the data is pre-divided into 80% for training, 10% for validation, and 10% for testing. The dataset is organized in a straightforward manner, with all speech files located in the "clips" directory and transcript files (train, dev, and test) in TSV format located in the parent directory. Each line in the transcript files represents a label for a single speech sample in the clips directory. The first column contains the path/name to the corresponding WAV file and the second column, separated by a tab, contains the transcript in text form.



Massey University


Speech Recognition, Audio Recording, Speaker Recognition, Text-to-Speech, Speech Synthesis
