TTS/V2V Audio Deepfake Dataset

Published: 18 December 2025| Version 2 | DOI: 10.17632/h4zbs27tkr.2
Contributors:
, Dora Maria Ballesteros L

Description

This dataset was created in 2025 and consists of 643 synthetic audio samples generated using the Minimax.io platform. It includes both text-to-speech (TTS) and voice-to-voice (V2V) synthetic speech. Specifically, the dataset contains 603 TTS audios and 40 V2V audios. The material spans two languages—Spanish and English—and includes 302 female voices, with the remaining samples corresponding to male voices. This dataset enables multiple downstream applications, including: (1) the development and training of models for synthetic-audio detection, (2) the external benchmarking of audio-deepfake classification systems, and (3) the assessment of model robustness by introducing adversarial or signal-level perturbations to the audio samples.

Files

Steps to reproduce

Download the zip file and extract it. You will find 643 wav audios. Additionally, an Excel file containing the metadata is provided. It includes specific information for each audio (tool, type, gender, age group, duration)

Institutions

Universidad Militar Nueva Granada

Categories

Audio Signal Analysis, Deepfake

Funders

Licence