ADAPTIVE: A Novel Dataset For Acoustic DysArthria deTection through temPoral Inference and Voice Engineering
Description
Dysarthria is a prevalent speech disorder affecting approximately 53% of individuals with speech-related challenges, often arising from neurological conditions such as strokes, cerebral palsy, or Parkinson’s disease. This disorder disrupts the coordination and strength of the muscles used for speech, complicating clear communication, especially with unfamiliar listeners. The impact of dysarthria extends beyond mere communication difficulties; it significantly affects social interactions, job prospects, and educational experiences. Consequently, these challenges can diminish the overall quality of life for those affected, making it imperative to address the disorder effectively. The research questions guiding this study on pre-screening for dysarthria using machine learning techniques are as follows: RQ1 What specific acoustic features contribute most to detecting speech dysarthria? RQ2 How can machine learning algorithms be optimized to enhance the accuracy of dysarthria detection compared to traditional assessment methods? RQ3 Can a minimum number of MFCC Features along with voice engineered features perform equally or better than taking into account a vast number of MFCC Features only? Hence we introduce ADAPTIVE: A Novel Dataset For Acoustic DysArthria deTection through temPoral Inference and Voice Engineering. Corresponding paper in the pipeline is yet to be published. Please cite this dataset of it helps in your studies or if you build your own dataset using the acoustic-temporal feature engineering scripts, ML models or use findings from our research in your papers.
Files
Steps to reproduce
This is a novel feature engineered dataset. We make use of the widely popular speech dysarthria datasets' raw audio files, namely TORGO and UASPEECH. The steps ot reproduce are as follows: 1. Procure the raw datasets from TORGO and UASPEECH 2. Clean and resample the dataset. For UASPEECH we further went ahead and resampled them along with reduce noise and clean the audio files. The cleaned UASPEECH Dataset if anybody is interested, is made available here on Kaggle by the first author: https://www.kaggle.com/datasets/aryashah2k/noise-reduced-uaspeech-dysarthria-dataset 3. We aim to take this as a binary classification problem, hence we restructered the dataset directory containing raw audio files into two folders namely 1 and 0, where 0 means no_dysarthria or the control group; and 1 meant is_dysarthria or the audio files with dysarthric speech present in them 3. We then create a custom script to extract acustic and temporal features and create a pandas dataframe with all these features. The script can be found here on this github repository once the dataset and the corresponding paper is already pubished. Github: https://github.com/aryashah2k/Acoustic-DysArthria-deTection-through-temPoral-Inference-and-Voice-Engineering 4. The generated pandas dataframe is then exported as a .csv file presented here 5. We proceed to clean the dataset and preprocess it before employing an end to end machine learning pipeline to visualise, model and evaluate the dataset.