Speech Dataset of Human and AI-Generated Voices

Name: Speech Dataset of Human and AI-Generated Voices
Creator: Huzain Azis
Published: 2025-03-10T20:24:12.364Z
Keywords: Artificial Intelligence, Signal Processing, Machine Learning, Audio Recording, Speech Analysis, Audio Analysis, Audio Recognition, Speech Generation

Azis, Huzain; Rismayanti, Nurul

doi:10.17632/5czyx2vppv.1

Speech Dataset of Human and AI-Generated Voices

Published: 10 March 2025| Version 1 | DOI: 10.17632/5czyx2vppv.1

Contributors:

Huzain Azis,

Description

This dataset consists of audio recordings in Indonesian language, categorized into two distinct classes: human voices (real) and synthetic voices generated using artificial intelligence (AI). Each class comprises 21 audio files, resulting in a total of 42 audio files. Each recording has a duration ranging from approximately 4 to 9 minutes, with an average length of around 6 minutes per file. All recordings are provided in WAV format and accompanied by a CSV file containing detailed duration metadata for each audio file. This dataset is suitable for research and applications in speech recognition, voice authenticity detection, audio analysis, and related fields. It enables comparative analysis between natural Indonesian speech and AI-generated synthetic speech.

Files

Steps to reproduce

1. Data Collection: Record original human voice audio samples from the designated voice provider using high-quality recording equipment in a quiet environment. 2. AI-Voice Generation: Generate synthetic voices using AI-based voice cloning or text-to-speech algorithms, based on the original human voice samples. 3. Audio Preprocessing: Convert and standardize all audio files into WAV format, ensuring consistent quality and clarity. 4. Data Labeling: Categorize and label each audio file into two classes: "Real" (human-recorded) and "Fake" (AI-generated). 5. Metadata Preparation: Document metadata, including file names, durations, and corresponding labels, into a CSV file. 6. Validation: Verify the integrity and clarity of audio recordings, checking for uniformity across both classes. 7. Dataset Packaging: Organize and package audio files along with the metadata CSV file for accessibility and ease of use.

Institutions

Universitas Muslim Indonesia

Speech Dataset of Human and AI-Generated Voices

Description

Files

Steps to reproduce

Institutions

Categories

Related Links

Licence