Speech Dataset of Human and AI-Generated Voices
Description
This dataset consists of audio recordings in Indonesian language, categorized into two distinct classes: human voices (real) and synthetic voices generated using artificial intelligence (AI). Each class comprises 21 audio files, resulting in a total of 42 audio files. Each recording has a duration ranging from approximately 4 to 9 minutes, with an average length of around 6 minutes per file. All recordings are provided in WAV format and accompanied by a CSV file containing detailed duration metadata for each audio file. This dataset is suitable for research and applications in speech recognition, voice authenticity detection, audio analysis, and related fields. It enables comparative analysis between natural Indonesian speech and AI-generated synthetic speech.
Files
Steps to reproduce
1. Data Collection: Record original human voice audio samples from the designated voice provider using high-quality recording equipment in a quiet environment. 2. AI-Voice Generation: Generate synthetic voices using AI-based voice cloning or text-to-speech algorithms, based on the original human voice samples. 3. Audio Preprocessing: Convert and standardize all audio files into WAV format, ensuring consistent quality and clarity. 4. Data Labeling: Categorize and label each audio file into two classes: "Real" (human-recorded) and "Fake" (AI-generated). 5. Metadata Preparation: Document metadata, including file names, durations, and corresponding labels, into a CSV file. 6. Validation: Verify the integrity and clarity of audio recordings, checking for uniformity across both classes. 7. Dataset Packaging: Organize and package audio files along with the metadata CSV file for accessibility and ease of use.