Performance evaluation data for speech synthesis models

Published: 16 April 2025| Version 1 | DOI: 10.17632/hhzmhj3xwz.1
Contributor:
Aru Ukenova

Description

This repository includes performance evaluation data that compares two speech synthesis models: VITS and Tacotron. The assessment was carried out to evaluate their effectiveness regarding latency, throughput, memory consumption, and resource utilization under similar conditions.

Files

Steps to reproduce

1. Set Up the Environment 1.1. Install Python 3.10 or later 1.2. Install required packages (e.g., PyTorch, NumPy, etc.) 1.3. (Optional) Create a virtual environment for isolation: python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt 2. Clone the repository and navigate to the evaluation directory: git clone [repo-link] cd [evaluation-directory] 3. Prepare the Models 3.1 Download or load pre-trained models for VITS and Tacotron 3.2 Ensure they are placed in the correct paths or directories as specified in the scripts 3.3 (Optional) Convert models to ONNX or TorchScript for optimization, if applicable 4. Run the Evaluation Scripts 4.1 Use the provided script to synthesize speech for the evaluation samples and log performance metrics: python eval_vits.py python eval_tacotron.py 5. Compare Results 5.1 The scripts will generate output files (CSV or JSON) with latency, throughput, memory usage, etc. 5.2 Visualize the results by using comparison_plot.py or the included Jupyter notebook.

Institutions

L N Gumilyov Eurasian National University

Categories

Natural Language Processing, Text-to-Speech, Model Evaluation, Deep Learning

Funding

The Science Committee of the Ministry of Science and Higher Education in the Republic of Kazakhstan

AP23489504

Licence