Performance evaluation data for speech synthesis models
Description
This repository includes performance evaluation data that compares two speech synthesis models: VITS and Tacotron. The assessment was carried out to evaluate their effectiveness regarding latency, throughput, memory consumption, and resource utilization under similar conditions.
Files
Steps to reproduce
1. Set Up the Environment 1.1. Install Python 3.10 or later 1.2. Install required packages (e.g., PyTorch, NumPy, etc.) 1.3. (Optional) Create a virtual environment for isolation: python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt 2. Clone the repository and navigate to the evaluation directory: git clone [repo-link] cd [evaluation-directory] 3. Prepare the Models 3.1 Download or load pre-trained models for VITS and Tacotron 3.2 Ensure they are placed in the correct paths or directories as specified in the scripts 3.3 (Optional) Convert models to ONNX or TorchScript for optimization, if applicable 4. Run the Evaluation Scripts 4.1 Use the provided script to synthesize speech for the evaluation samples and log performance metrics: python eval_vits.py python eval_tacotron.py 5. Compare Results 5.1 The scripts will generate output files (CSV or JSON) with latency, throughput, memory usage, etc. 5.2 Visualize the results by using comparison_plot.py or the included Jupyter notebook.
Institutions
Categories
Funding
The Science Committee of the Ministry of Science and Higher Education in the Republic of Kazakhstan
AP23489504