UGAkan-ImpairedSpeechData: A Dataset of Impaired Speech in the Akan Language
Description
The UGAkan-ImpairedSpeechData is a speech dataset from indigenous speakers of Akan with different forms of speech impairments. It contains audio descriptions of culturally relevant images. The dataset comprises 14,312 audio files and corresponding transcriptions equivalent to 50.01 hours. Recordings were done in different environments including Outdoor (7,706 audio files), Other (3,075), Indoor (2,254), Studio (982), and Car (295). The dataset is also categorized by aetiology and gender. Male speakers contributed 6,754 files equivalent to 19.02 hours, with the highest representation from individuals with Cerebral Palsy (2,881 files, 8.84 hours), followed by Stammering, Cleft, and Stroke. Female speakers contributed 7,558 files equivalent to 30.99 hours, with most recordings coming from individuals with Cerebral Palsy (4,835 files, 15.66 hours) and Stammering (2,574 files, 13.88 hours). Stroke data was recorded only from male speakers, while Cleft speech samples were collected from both genders, with a higher volume from males. In terms of duration, the audio files vary in length. The average audio length is 12.46 seconds, with a standard deviation of 7.71 seconds, indicating moderate variability. The majority of audio files range from 6.59s to 16.00s, suggesting a right-skewed distribution. The maximum duration is 60.08s, which exceeds the upper bound of the interquartile range and is likely an outlier.
Files
Steps to reproduce
1. Develop a speech data collection app: For this project, audio recordings of image were collected using a custom-made Android mobile application. The app was configured in a way that the record button was only activated when no background noise is detected. Also, if noise was detected during the recording, it will notify the user and the save audio button will be deactivated. 2. Integrate up to 1,000 culturally relevant images to the app. 2. Recruit volunteers using convenience sampling or controlled crowdsourcing from urban and rural Akan speaking communities. 3. Enlighten volunteers of the project's goal, obtain informed consent from them, and train them on how to use the app. 4. Explain the predefined guidelines of recordings to volunteers: for this project, some rules included recording in quiet environments, non-use of vulgar or profane words, strict use of Akan for describing assigned images and non-use of English words except loanwords. 5. Do a test run with the volunteers with at least 10 images to confirm that they comply with the recording guidelines. 6. Design a transcription app (preferably, a desktop app). 7. Recruit Akan linguists to transcribe the collected audio recordings, install the transcription app and the Akan keyboard on their laptops. 8. Train them on how to use the transcription app, and also provide them transcription guidelines. a. All transcriptions must follow standard Akan (Asante Twi) orthography. b. If the speaker uses English words, retain the original spelling if the pronunciation is clear. If the pronunciation deviates, transcribe the word phonetically as spoken in Akan (e.g., “kompyuta” for "computer"). c. For unfamiliar or borrowed terms without standard spellings, use phonetic Akan approximations and flag them for review. Follow community-accepted spellings when available. d. Do not transcribe filler sounds such as “errr,” “hmmm,” or “mmm” unless they clearly add emotional or contextual meaning to the utterance. e. Do not guess-transcribe inaudible or unclear utterances. f. Transcribe what may seem like word repetitions, corrections, and stuttering within an audio recording. g. Do not transcribe audio recordings with an audible secondary voice in the background. 9. Use a double-blind method to transcribe the audios. That is, assign the same audio to at least two transcribers. Resolve discrepancies through consensus or with input from a lead linguist.
Institutions
- University of Ghana
Categories
Funders
- Google (United States)United States