English pronunciation evaluation dataset

Published: 2 March 2021| Version 1 | DOI: 10.17632/tg854n8th8.1
Ramon Brena,


Dataset of features from the voices of people learning to speak English. This dataset contains the voices of 33 speakers, and for each one of them, there are 840 rows, taking 5 seconds segments and also using data augmentation by speeding up or slowing down the original audio. The columns are: - The speaker identification number - The tag: 0 for low level, 1 for intermediate, 2 for high, - The remaining columns are the features considered in this dataset, such as Zero Crossing rate, Energy, Energy entropy, the Mel Frequency Cepstral Coefficients (MFCC), as well as many others (see first row of the CVS file; all of them are very much standard audio features intended for classification.


Steps to reproduce

1.- Capture spontaneous speech in English by people learning English (in this case 33 Latin American students) 2.- Slice into 5-second segments. 3.- For data augmentation, speed up and slow down the original audio, and slice as in the previous point. 4.- Obtain features for each audio segment using LibRosa Python library


Instituto Tecnologico y de Estudios Superiores de Monterre Escuola de Ingenieria y Ciencias


Human Voice