The First FOSD-Tacotron-2-based Text-to-Speech Model Dataset for Vietnamese

Published: 20 April 2020| Version 1 | DOI: 10.17632/dsmrndnmyy.1
Duc Chung Tran


This is the 1st FPT Open Speech Data (FOSD) and Tacotron-2 -based Text-to-Speech Model Dataset for Vietnamese. It comprises of: - A configuration file in *.json format; - Training and validation text input files (in *.csv format); - A trained model (checkpoint file, after 225,000 steps); - Sample generated audios from the trained model. This dataset is useful for research related to TTS and its applications, text processing and especially TTS output optimization given a set of predefined input texts.



Signal Processing, Speech Processing, Natural Language Processing, Audio Signal Processing, Synthesis, Vietnamese Language, Text-to-Speech, Natural Language Generation