Skip to main content

Speech Communication

ISSN: 0167-6393

Visit Journal website

Datasets associated with articles published in Speech Communication

Filter Results
1970
2024
1970 2024
4 results
  • Data for: Harmonic Beamformers for Speech Enhancement and Dereverberation in the Time Domain
    This dataset contains the code that were used for conducting the experimental evaluations in the paper. Moreover, it contains the speech data used for the evaluation on synthetic data in the manuscript, i.e., to produce the results presented in Figs. 1-3. The evaluation on real data was conducted on data from SMARD (https://www.smard.es.aau.dk/). More specifically, the signals labeled FA0309 and MD2404 in the two scenarios labelled 1011 and 1111 were considered, and, for each scenario, the two ULAs, A and B, were used for the evaluation.
    • Dataset
  • Data for: Single-Channel Speech Enhancement Using Inter-Component Phase Relations
    We provide audio files as listening examples for the results reported in the paper. A demo webpage is also available at: www2.spsc.tugraz.at/people/pmowlaee/ICPR.html
    • Dataset
  • Data for: Effect of talker variability on hearing aid benefit with closed word recognition test
    Closed set word recognition test based on 4 lists of the WAKO test with 47 words. The original set is labelled ST and the multitalker MT.
    • Dataset
  • The acoustic feature dataset of WD patients and healthy individuals
    The study uses a state-of-the-art speech embedding method for WD detection in unstructured connected speech (UCS), combining bi-directional semantic dependencies and attentional mechanisms.The feature data file contains 110 native Mandarin-speaking participants, including 55 WD patients and 55 sex-matched healthy individuals. Four columns of data are labels (0 for healthy individuals and 1 for WD patients), ComParE feature set, Wav2vec 2.0, and HuBERT embedded feature set.To obtain frame-level speech representations that can be compared and fused with embedding approaches, we use only the LLDs of ComParE (the current latest 2016 version), which contains 65-dimensional features per time step, and configure the window length and the step length to 30 ms and 20 ms, respectively. The final ComParE feature shape of each participant's 60s audio is 2999 × 65.For adapting to native speech data, we extract embeddings based on pre-trained models w2v2 and HuBERT fine-tuned on 10,000 hours of Chinese speech data from WenetSpeech, respectively. Furthermore, considering the computational resources and time cost, we choose to use the base version of the pre-trained models, i.e., the final 768-dimensional hidden layer, as the embedding representation of the audio. The last hidden state in the model serves as the embedding representation with a shape of 2999 × 768 for an audio sample.
    • Dataset