Filter Results
45 results
Purpose: The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech.Method: An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This system mainly consists of 3 parts: the acoustic model, the language model, and the decoding model. The acoustic features are extracted during speech preprocessing, and 3,230 utterances of EL speech mixed with 10,000 utterances of healthy speech are used to train the ASR system. Comparative experiment was designed to evaluate the performance of the proposed method.Results: The results show that the proposed ASR system has higher stability and generalizability compared with the traditional methods, manifesting superiority in terms of Chinese characters, Chinese words, short sentences, and long sentences. Phoneme confusion occurs more easily in the stop and affricate of EL speech than the healthy speech. However, the highest accuracy of the ASR could reach 83.24% when 3,230 utterances of EL speech were used to train the ASR system.Conclusions: This study indicates that EL speech could be recognized effectively by the ASR based on WaveNet-CTC. This proposed method has a higher generalization performance and better stability than the traditional methods. A higher accuracy of the ASR system based on WaveNet-CTC can be obtained, which means that EL speech can be converted into healthy speech. Supplemental Materials S1–S10. 10 electrolaryngeal (EL) speech sentences .wav files. Qian, Z., Wang, L., Zhang, S., Liu, C., & Niu, H. (2019). Mandarin electrolaryngeal speech recognition based on WaveNet-CTC. Journal of Speech, Language, and Hearing Research, 62, 2203–2212. https://doi.org/10.1044/2019_JSLHR-S-18-0313
Data Types:
  • Video
Purpose: This review article introduces research methods for personalization of intervention. Our goals are to review evidence-based practices for improving social communication impairment in children with autism spectrum disorder generally and then how these practices can be systematized in ways that personalize intervention, especially for children who respond slowly to an initial evidence-based practice.Method: The narrative reflects on the current status of modular and targeted interventions on social communication outcomes in the field of autism research. Questions are introduced regarding personalization of interventions that can be addressed through research methods. These research methods include adaptive treatment designs and the Sequential Multiple Assignment Randomized Trial. Examples of empirical studies using research designs are presented to answer questions of personalization.Conclusion: Bridging the gap between research studies and clinical practice can be advanced by research that attempts to answer questions pertinent to the broad heterogeneity in children with autism spectrum disorder, their response to interventions, and the fact that a single intervention is not effective for all children. Publisher Note: This article is part of the Research Forum: Advances in Autism Research: From Learning Mechanisms to Novel Interventions. Kasari, C., Sturm, A., & Shih, W. (2018). SMARTer approach to personalizing intervention for children with autism spectrum disorder. Journal of Speech, Language, and Hearing Research, 61(11), 2629–2640. https://doi.org/10.1044/2018_JSLHR-L-RSAUT-18-0029
Data Types:
  • Video
Purpose: Our aim was to make audible for normal-hearing listeners the Mickey Mouse™ sound quality of cochlear implants (CIs) often found following device activation.Method: The listeners were 3 single-sided deaf patients fit with a CI and who had 6 months or less of CI experience. Computed tomography imaging established the location of each electrode contact in the cochlea and allowed an estimate of the place frequency of the tissue nearest each electrode. For the most apical electrodes, this estimate ranged from 650 to 780 Hz. To determine CI sound quality, a clean signal (a sentence) was presented to the CI ear via a direct connect cable and candidate, and CI-like signals were presented to the ear with normal hearing via an insert receiver. The listeners rated the similarity of the candidate signals to the sound of the CI on a 1- to 10-point scale, with 10 being a complete match.Results: To make the match to CI sound quality, all 3 patients need an upshift in formant frequencies (300–800 Hz) and a metallic sound quality. Two of the 3 patients also needed an upshift in voice pitch (10–80 Hz) and a muffling of sound quality. Similarity scores ranged from 8 to 9.7.Conclusion: The formant frequency upshifts, fundamental frequency upshifts, and metallic sound quality experienced by the listeners can be linked to the relatively basal locations of the electrode contacts and short duration experience with their devices. The perceptual consequence was not the voice quality of Mickey Mouse™ but rather that of Munchkins in The Wizard of Oz for whom both formant frequencies and voice pitch were upshifted. Supplemental Material S1. Video demonstrating the procedure to obtain a match between a signal presented to the cochlear implant (CI) ear and candidate, CI-like signal to a normal hearing ear. Supplemental Material S2. Audio files of clean signal presented to CI ear and the sound quality match from the normal hearing ear. S2_1: Patient 2460S2_2: Patient 2461S2_3: Patient 2465 Dorman, M. F., Natale, S. C., Zeitler, D. M., Baxter, L., & Noble, J. H. (2019). Looking for Mickey Mouse™ but finding a Munchkin: The perceptual effects of frequency upshifts for single-sided deaf, cochlear implant patients. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2019_JSLHR-H-18-0389
Data Types:
  • Video
Purpose: Children with autism spectrum disorder (ASD) demonstrate many mechanisms of lexical acquisition that support language in typical development; however, 1 notable exception is the shape bias. The bases of these children’s difficulties with the shape bias are not well understood, and the current study explored potential sources of individual differences from the perspectives of both attentional and conceptual accounts of the shape bias.Method: Shape bias performance from the dataset of Potrzeba, Fein, and Naigles (2015) was analyzed, including 33 children with typical development (M = 20 months; SD = 1.6), 15 children with ASD with high verbal abilities (M = 33 months; SD = 4.6), and 14 children with ASD with low verbal abilities (M = 33 months; SD = 6.6). Lexical predictors (shape-side noun percentage from the MacArthur–Bates Communicative Development Inventory; Fenson et al., 2007) and social-pragmatic predictors (joint attention duration during play sessions) were considered as predictors of subsequent shape bias performance.Results: For children in the low verbal ASD group, initiation of joint attention (positively) and passive attention (negatively) predicted subsequent shape bias performance, controlling for initial language and developmental level. Proportion of child’s known nouns with shape-defined properties correlated negatively with shape bias performance in the high verbal ASD group but did not reach significance in regression models.Conclusions: These findings suggest that no single account sufficiently explains the observed individual differences in shape bias performance in children with ASD. Nonetheless, these findings break new ground in highlighting the role of social communicative interactions as integral to understanding specific language outcomes (i.e., the shape bias) in children with ASD, especially those with low verbal abilities, and point to new hypotheses concerning the linguistic content of these interactions. Publisher Note: This article is part of the Research Forum: Advances in Autism Research: From Learning Mechanisms to Novel Interventions. Abdelaziz, A., Kover, S. T., Wagner, M., & Naigles, L. R. (2018). The shape bias in children with autism spectrum disorder: Potential sources of individual differences. Journal of Speech, Language, and Hearing Research, 61(11), 2685–2702. https://doi.org/10.1044/2018_JSLHR-L-RSAUT-18-0027
Data Types:
  • Video
Purpose: Age-related sensorineural hearing loss can dramatically affect speech recognition performance due to reduced audibility and suprathreshold distortion of spectrotemporal information. Normal aging produces changes within the central auditory system that impose further distortions. The goal of this study was to characterize the effects of aging and hearing loss on perceptual representations of speech.Method: We asked whether speech intelligibility is supported by different patterns of spectrotemporal modulations (STMs) in older listeners compared to young normal-hearing listeners. We recruited 3 groups of participants: 20 older hearing-impaired (OHI) listeners, 19 age-matched normal-hearing listeners, and 10 young normal-hearing (YNH) listeners. Listeners performed a speech recognition task in which randomly selected regions of the speech STM spectrum were revealed from trial to trial. The overall amount of STM information was varied using an up–down staircase to hold performance at 50% correct. Ordinal regression was used to estimate weights showing which regions of the STM spectrum were associated with good performance (a “classification image” or CImg).Results: The results indicated that (a) large-scale CImg patterns did not differ between the 3 groups; (b) weights in a small region of the CImg decreased systematically as hearing loss increased; (c) CImgs were also nonsystematically distorted in OHI listeners, and the magnitude of this distortion predicted speech recognition performance even after accounting for audibility; and (d) YNH listeners performed better overall than the older groups.Conclusion: We conclude that OHI/older normal-hearing listeners rely on the same speech STMs as YNH listeners but encode this information less efficiently. Supplemental Observer Simulation: We performed a simulation in which a simulated listener performed the same experimental procedure as the real listeners at each of several values for average number of bubbles at threshold (50-150 in steps of 10, the range observed for real OHI listeners). Supplemental Material S1–S3. Audio. Examples of auditory bubbles stimuli. Three sentences with varying degrees of intelligibility are provided. For each sentence, a filter with 50 bubbles was applied to produce the stimulus. In each audio file, the bubbles-filtered sentence is followed by a clear, unprocessed version of the same sentence to allow comparison. Venezia, J. H., Martin, A.-G., Hickok, G., & Richards, V. M. (2019). Identification of the spectrotemporal modulations that support speech intelligibility in hearing-impaired and normal-hearing listeners. Journal of Speech, Language, and Hearing Research, 62, 1051–1067. https://doi.org/10.1044/2018_JSLHR-H-18-0045
Data Types:
  • Video
Purpose: The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech.Method: An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This system mainly consists of 3 parts: the acoustic model, the language model, and the decoding model. The acoustic features are extracted during speech preprocessing, and 3,230 utterances of EL speech mixed with 10,000 utterances of healthy speech are used to train the ASR system. Comparative experiment was designed to evaluate the performance of the proposed method.Results: The results show that the proposed ASR system has higher stability and generalizability compared with the traditional methods, manifesting superiority in terms of Chinese characters, Chinese words, short sentences, and long sentences. Phoneme confusion occurs more easily in the stop and affricate of EL speech than the healthy speech. However, the highest accuracy of the ASR could reach 83.24% when 3,230 utterances of EL speech were used to train the ASR system.Conclusions: This study indicates that EL speech could be recognized effectively by the ASR based on WaveNet-CTC. This proposed method has a higher generalization performance and better stability than the traditional methods. A higher accuracy of the ASR system based on WaveNet-CTC can be obtained, which means that EL speech can be converted into healthy speech. Supplemental Materials S1–S10. 10 electrolaryngeal (EL) speech sentences .wav files. Qian, Z., Wang, L., Zhang, S., Liu, C., & Niu, H. (2019). Mandarin electrolaryngeal speech recognition based on WaveNet-CTC. Journal of Speech, Language, and Hearing Research, 62, 2203–2212. https://doi.org/10.1044/2019_JSLHR-S-18-0313
Data Types:
  • Video
Purpose: Our aim was to make audible for normal-hearing listeners the Mickey Mouse™ sound quality of cochlear implants (CIs) often found following device activation.Method: The listeners were 3 single-sided deaf patients fit with a CI and who had 6 months or less of CI experience. Computed tomography imaging established the location of each electrode contact in the cochlea and allowed an estimate of the place frequency of the tissue nearest each electrode. For the most apical electrodes, this estimate ranged from 650 to 780 Hz. To determine CI sound quality, a clean signal (a sentence) was presented to the CI ear via a direct connect cable and candidate, and CI-like signals were presented to the ear with normal hearing via an insert receiver. The listeners rated the similarity of the candidate signals to the sound of the CI on a 1- to 10-point scale, with 10 being a complete match.Results: To make the match to CI sound quality, all 3 patients need an upshift in formant frequencies (300–800 Hz) and a metallic sound quality. Two of the 3 patients also needed an upshift in voice pitch (10–80 Hz) and a muffling of sound quality. Similarity scores ranged from 8 to 9.7.Conclusion: The formant frequency upshifts, fundamental frequency upshifts, and metallic sound quality experienced by the listeners can be linked to the relatively basal locations of the electrode contacts and short duration experience with their devices. The perceptual consequence was not the voice quality of Mickey Mouse™ but rather that of Munchkins in The Wizard of Oz for whom both formant frequencies and voice pitch were upshifted. Supplemental Material S1. Video demonstrating the procedure to obtain a match between a signal presented to the cochlear implant (CI) ear and candidate, CI-like signal to a normal hearing ear. Supplemental Material S2. Audio files of clean signal presented to CI ear and the sound quality match from the normal hearing ear. S2_1: Patient 2460S2_2: Patient 2461S2_3: Patient 2465 Dorman, M. F., Natale, S. C., Zeitler, D. M., Baxter, L., & Noble, J. H. (2019). Looking for Mickey Mouse™ but finding a Munchkin: The perceptual effects of frequency upshifts for single-sided deaf, cochlear implant patients. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2019_JSLHR-H-18-0389
Data Types:
  • Video
Purpose: Age-related sensorineural hearing loss can dramatically affect speech recognition performance due to reduced audibility and suprathreshold distortion of spectrotemporal information. Normal aging produces changes within the central auditory system that impose further distortions. The goal of this study was to characterize the effects of aging and hearing loss on perceptual representations of speech.Method: We asked whether speech intelligibility is supported by different patterns of spectrotemporal modulations (STMs) in older listeners compared to young normal-hearing listeners. We recruited 3 groups of participants: 20 older hearing-impaired (OHI) listeners, 19 age-matched normal-hearing listeners, and 10 young normal-hearing (YNH) listeners. Listeners performed a speech recognition task in which randomly selected regions of the speech STM spectrum were revealed from trial to trial. The overall amount of STM information was varied using an up–down staircase to hold performance at 50% correct. Ordinal regression was used to estimate weights showing which regions of the STM spectrum were associated with good performance (a “classification image” or CImg).Results: The results indicated that (a) large-scale CImg patterns did not differ between the 3 groups; (b) weights in a small region of the CImg decreased systematically as hearing loss increased; (c) CImgs were also nonsystematically distorted in OHI listeners, and the magnitude of this distortion predicted speech recognition performance even after accounting for audibility; and (d) YNH listeners performed better overall than the older groups.Conclusion: We conclude that OHI/older normal-hearing listeners rely on the same speech STMs as YNH listeners but encode this information less efficiently. Supplemental Observer Simulation: We performed a simulation in which a simulated listener performed the same experimental procedure as the real listeners at each of several values for average number of bubbles at threshold (50-150 in steps of 10, the range observed for real OHI listeners). Supplemental Material S1–S3. Audio. Examples of auditory bubbles stimuli. Three sentences with varying degrees of intelligibility are provided. For each sentence, a filter with 50 bubbles was applied to produce the stimulus. In each audio file, the bubbles-filtered sentence is followed by a clear, unprocessed version of the same sentence to allow comparison. Venezia, J. H., Martin, A.-G., Hickok, G., & Richards, V. M. (2019). Identification of the spectrotemporal modulations that support speech intelligibility in hearing-impaired and normal-hearing listeners. Journal of Speech, Language, and Hearing Research, 62, 1051–1067. https://doi.org/10.1044/2018_JSLHR-H-18-0045
Data Types:
  • Video
Purpose: The 2 most commonly used operations to treatvelopharyngeal inadequacy (VPI) are superiorly basedpharyngeal flap and sphincter pharyngoplasty, both ofwhich may result in hyponasal speech and airwayobstruction. The purpose of this article is to (a) describethe bilateral buccal flap revision palatoplasty (BBFRP) asan alternative technique to manage VPI while minimizingthese risks and (b) conduct a systematic review of theevidence of BBFRP on speech and other clinicaloutcomes. A report comparing the speech of a child withhypernasality before and after BBFRP is presented.Method: A review of databases was conducted for studies of buccal flaps to treat VPI. Using the principles ofa systematic review, the articles were read, and data wereabstracted for study characteristics that were developeda priori. With respect to the case report, speech andinstrumental data from a child with repaired cleft lip andpalate and hypernasal speech were collected and analyzedbefore and after surgery.Results: Eight articles were included in the analysis. The results were positive, and the evidence is in favor of BBFRPin improving velopharyngeal function, while minimizing therisk of hyponasal speech and obstructive sleep apnea.Before surgery, the child’s speech was characterized bymoderate hypernasality, and after surgery, it was judged tobe within normal limits.Conclusion: Based on clinical experience and results from the systematic review, there is sufficient evidencethat the buccal flap is effective in improving resonanceand minimizing obstructive sleep apnea. We recommendit as a first-line approach in selected patients to manageVPI. Supplemental Material S1. Supplemental Material S2. Napoli, J. A., & Vallino, L. D. (2019). Treating velopharyngeal inadequacy using bilateral buccal flap revision palatoplasty. Perspectives of the ASHA Special Interest Groups. Advance online publication. https://doi.org/10.1044/2019_PERS-SIG5-2019-0005
Data Types:
  • Video
Purpose: This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech.Method: Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli.Results: Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level.Conclusions: Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material S1. Normal-Rate Fixed f0 Supplemental Material S2. Slow-Rate Fixed f0 Supplemental Material S3. Normal-Rate Prosodically Natural f0 Variation Supplemental Material S4. Normal-Rate Prosodically Unnatural f0 Variation Vojtech, J. M., Noordzij, J. P., Jr., Cler, G. J., & Stepp, C. E. (2019). The effects of modulating fundamental frequency and speech rate on the intelligibility, communication efficiency, and perceived naturalness of synthetic speech. American Journal of Speech-Language Pathology, 28, 875–886. https://doi.org/10.1044/2019_AJSLP-MSC18-18-0052 Publisher Note: This article is part of the Special Issue: Selected Papers From the 2018 Conference on Motor Speech—Clinical Science and Innovations.
Data Types:
  • Video