Synthetic Dysarthric Speech Dataset for Robust ASR Research

Published: 8 May 2026| Version 1 | DOI: 10.17632/yrfzc6wfr9.1
Contributors:
Unmesh Raj,
,
,

Description

This dataset contains synthetically generated dysarthric speech samples created for research in Automatic Speech Recognition (ASR), speech augmentation, and assistive AI systems. The dataset was developed to address the scarcity of publicly available dysarthric speech data and to support the training and evaluation of robust ASR models. The dataset includes speech generated using Text-to-Speech (TTS) systems and modified using augmentation techniques to simulate dysarthric speech characteristics such as articulation irregularities, pitch variation, temporal distortion, and environmental noise. The dataset contains both male and female synthetic speakers. Contents: 4 Male speaker datasets 4 Female speaker datasets Metadata files containing transcript mappings Applications: Dysarthric speech recognition Whisper ASR fine-tuning Speech augmentation research Robust ASR under noisy conditions Assistive communication systems Semantic correction and speech restoration pipelines The dataset is intended for academic and research purposes in speech processing, machine learning, accessibility, and healthcare-oriented AI systems

Files

Institutions

Categories

Assistive Technology, Natural Language Processing, Speech Recognition, Data Processing, Deep Learning

Licence