Filter Results
100 results
  • A description
    Data Types:
    • Other
    • Software/Code
    • Image
    • Video
    • Tabular Data
    • Dataset
    • Document
    • Text
    • Audio
  • High accuracy classification of COVID-19 coughs using Mel-frequency cepstral coefficients and a Convolutional Neural Network with a use case for smart home devices. Diagnosing COVID-19 early in domestic settings is possible through smart home devices that can classify audio input of coughs, and determine whether they are COVID-19. Research is currently sparse in this area and data is difficult to obtain. How- ever, a few small data collection projects have en- abled audio classification research into the application of different machine learning classification algorithms, including Logistic Regression (LR), Support Vector Machines (SVM), and Convolution Neural Networks (CNN). We show here that a CNN using audio converted to Mel-frequency cepstral coefficient spectrogram images as input can achieve high accuracy results; with classification of validation data scoring an accuracy of 97.5% cor- rect classification of covid and not covid labelled audio. The work here provides a proof of concept that high accuracy can be achieved with a small dataset, which can have a significant impact in this area. The results are highly encouraging and provide further opportunities for research by the academic community on this important topic. Preprint: https://www.researchgate.net/publication/343376336_High_accuracy_classification_of_COVID-19_coughs_using_Mel-frequency_cepstral_coefficients_and_a_Convolutional_Neural_Network_with_a_use_case_for_smart_home_devices
    Data Types:
    • Software/Code
    • Tabular Data
    • Dataset
    • Audio
  • This dataset is published for Bengali continuous speech recognition. The dataset has three files "Script Files" contain the Bengali text for continuous speech and "Speech Files" contain the recorded audio speech. "Speakers_information" CSV file has stored the speaker's information.
    Data Types:
    • Other
    • Software/Code
    • Tabular Data
    • Dataset
    • Text
    • Audio
  • The recordings in this database were collected for the purpose of evaluating the ability of a playback attack detector to safeguard a remote-access speaker-verified and passphrase-protected system from playback attacks. This database includes multiple utterances of the same phrase by the same person in addition to a variety of distorted versions of many of the utterances. Multiple distortions of an utterance were obtained, in part, by simultaneously recording the utterance at both ends of a telecommunication channel – using a digital voice recorder to obtain the user-end (i.e., in-person) recording and a telephony board to obtain the system-end recording. While the former suffers little distortion, the latter suffers the “non-stationary” distortion imposed by the channel. Additional distortions of the same utterance were captured at the system-end of the channel when the in-person recording was replayed at the user-end; these additional recordings simulate playback attacks and suffer the distortion of both the playback device and the channel. The database may be used: to evaluate the vulnerability of a speaker verification system (SVS) to playback attacks; to evaluate the performance of a copy-detection or distortion-detection based playback attack detector (PAD); to evaluate the overall security of a speaker verification system in tandem with a playback attack countermeasure; or to investigate the distortion imposed by various telecommunication channels and/or playback speakers.
    Data Types:
    • Tabular Data
    • Dataset
    • Audio
  • This database was created through generous funding from The Voice Foundation's Advancing Scientific Voice Research Grant and contains voice samples which have been rated by experienced voice professionals (at least 3 different raters with a minimum of 2 years’ clinical experience) in order to provide educators with standardized materials to better train pre-service clinical voice professionals. It contains 296 audio files consisting of the sustained /a/ and /i/ vowels and the sentences from the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V; Kempster, 2007). All recordings were made in a quiet clinical environment using a head-mounted condenser microphone at a 6-centimeter distance from the corner of the mouth and the Computerized Speech Lab (CSL) using 16-bit encryption and a sampling rate of 48k. Audio recordings have been edited as best as possible to remove all clinician instructions. However, please listen to and look at each file carefully just in case there was simultaneous clinician-client talk. Listeners rated approximately 50 files each and each file was rated twice for reliability measurement (for a total of approximately 100 ratings per rater). Raters used a computer to listen to the samples and rate voice quality via a web-based system that included custom-made electronic scales for the CAPE-V (Kempster, 2007) and the GRBAS (Hirano, 1981) using Qualtrics survey software. Listeners rated each file on a 100-point visual analogue scale (VAS) to mimic the paper-based CAPE-V protocol. Please note that severity markers (mild, moderate, severe) were not included on the 100-point VAS to avoid influencing the concurrent rating using the GRBAS scale. Raters were urged to rate the samples over several days to avoid fatigue. Further description of methods is located in the folders below. Questions about the database can be directed to Patrick R. Walden, Ph.D., CCC-SLP at waldenp@stjohns.edu. References: Kempster G. CAPE-V: Development and Future Direction. Perspect Voice Voice Dis. 2007;17(2):11-13. doi:10.1044/vvd17.2.11 Hirano M. Clinical Examination of Voice. Springer-Verlag; 1981.
    Data Types:
    • Tabular Data
    • Dataset
    • Document
    • Audio
  • Smoke Test (HomeWiFI) 28May2020 natscilivecustomer (Dataset-1)
    Data Types:
    • Other
    • Software/Code
    • Image
    • Video
    • Tabular Data
    • Dataset
    • Document
    • Text
    • Audio
  • This dataset consists of 25,921 recorded Vietnamese speeches (with their transcripts and the labelled start and end times of each speech) manually compiled from 3 sub-datasets (approximately 30 hours in total) released publicly in 2018 by FPT Corporation. The speeches are in *.mp3 format while the transcript file is in *.txt format with utf-8 encoding scheme. The dataset is useful for several speech-related research topics, including but not limited to text-to-speech, speech-to-text applications, gender detection, mood detection, intent detection, onset detection, signal-to-noise improvement, signal processing, speech processing, etc. Copyright 2018 FPT Corporation Permission is hereby granted, free of charge, non-exclusive, worldwide, irrevocable, to any person obtaining a copy of this data or software and associated documentation files (the “Data or Software”), to deal in the Data or Software without restriction, including without limitation the rights to use, copy, modify, remix, transform, merge, build upon, publish, distribute and redistribute, sublicense, and/or sell copies of the Data or Software, for any purpose, even commercially, and to permit persons to whom the Data or Software is furnished to do so, subject to the following conditions: The above copyright notice, and this permission notice, and indication of any modification to the Data or Software, shall be included in all copies or substantial portions of the Data or Software. THE DATA OR SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR SOFTWARE OR THE USE OR OTHER DEALINGS IN THE DATA OR SOFTWARE. Patent and trademark rights are not licensed under this FPT Public License.
    Data Types:
    • Dataset
    • Text
    • Audio
  • This is the 1st FPT Open Speech Data (FOSD) and Tacotron-2 -based Text-to-Speech Model Dataset for Vietnamese. It comprises of: - A configuration file in *.json format; - Training and validation text input files (in *.csv format); - A trained model (checkpoint file, after 225,000 steps); - Sample generated audios from the trained model. This dataset is useful for research related to TTS and its applications, text processing and especially TTS output optimization given a set of predefined input texts. Copyright 2018 FPT Corporation Permission is hereby granted, free of charge, non-exclusive, worldwide, irrevocable, to any person obtaining a copy of this data or software and associated documentation files (the “Data or Software”), to deal in the Data or Software without restriction, including without limitation the rights to use, copy, modify, remix, transform, merge, build upon, publish, distribute and redistribute, sublicense, and/or sell copies of the Data or Software, for any purpose, even commercially, and to permit persons to whom the Data or Software is furnished to do so, subject to the following conditions: The above copyright notice, and this permission notice, and indication of any modification to the Data or Software, shall be included in all copies or substantial portions of the Data or Software. THE DATA OR SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR SOFTWARE OR THE USE OR OTHER DEALINGS IN THE DATA OR SOFTWARE. Patent and trademark rights are not licensed under this FPT Public License.
    Data Types:
    • Software/Code
    • Tabular Data
    • Dataset
    • Text
    • Audio
    • File Set
  • El acceso abierto es la disponibilidad gratuita, inmediata y en línea de los productos de la investigación y del conocimiento. Lo que incluye algunos derechos que permiten a otros reutilizar la investigación con fines educativos no comerciales. Con el aumento de las iniciativas de acceso abierto en todo el mundo, los investigadores y las editoriales universitarias están explorando lo que significa publicar un libro de acceso abierto, asegurando al mismo tiempo que el libro siga cumpliendo su propósito.
    Data Types:
    • Slides
    • Image
    • Dataset
    • Audio
  • Version1: NatSciLive Natra, Mahesh Live Version2: Mahesh Live, NatSciLive Natra Version3: Rehan Ahmad, NatSciLive Natra, Mahesh Live Version4: Mahesh Live, Rehan Ahmad Version5: A AA, Rehan Ahmad, Mahesh Elsevier Version6: A AA, Mahesh Elsevier, Rehan Ahmad, Natscie Live Version7: A AA, Mahesh Elsevier, Natscie Live Version8: Mahesh Elsevier, Natscie Live, A AA Version9: , Natsci Live, Mahesh Elsevier, A AA Version10: , Emre Cosar, A AA, Mahesh Elsevier, Natsci Live Version11: Dummy E, Emre C, Mahesh N
    Data Types:
    • Other
    • Software/Code
    • Image
    • Video
    • Tabular Data
    • Dataset
    • Document
    • Text
    • Audio
1