Fake Audio Dataset (ElevenLabs & Respeecher)
Description
This dataset was created during the first semester of 2025. It consists of 600 synthetic audios generated with the ElevenLabs and Respeecher tools. It includes both audios generated with TTS (text-to-speech) and V2V (voice-to-voice). Specifically, there are 335 audios generated with ElevenLabs, of which 282 correspond to V2V and 53 to TTS; while 265 audios were generated with Respeecher, of which 210 are V2V and 55 are TTS. This results in a total of 492 V2V audios and 108 TTS audios. In terms of gender distribution, 49% of the audios are male voices, while 51% are female voices. The duration of the audios ranges from 8 to 10 seconds, with a sampling rate of 22,050 Hz. Most of the voices correspond to adults (538 out of 600). This dataset can be used for: (1) training synthetic audio classification models, (2) performing external validation of synthetic audio classification models, and (3) applying attacks to audios and verifying the robustness of synthetic audio classification models.
Files
Steps to reproduce
Download the zip file and extract it. You will find 600 wav audios. Additionally, an Excel file containing the metadata is provided. It includes specific information for each audio (tool, type, gender, age group, duration)