Automatic Commentary Training Speech Synthesis System for Broadcasting
Thanks to the rapid progress in deep learning technology, the text-to-speech (TTS) system we developed has achieved the same quality as human speech, enabling us to launch a fully automatic program production system known as “AI Anchor.” The TTS system needs a large amount of speech and label data, but data production costs are high and TTS speakers cannot be easily added. This paper presents a novel TTS method featuring automatic training from broadcast commentary. It uses an approach that allows for a new semi-supervised learning method using an accentual data recognition method specialized for TTS. We have automated the entire training process for generating training data and performing label data recognition from broadcast commentary. In this paper, we present practical examples of automated program production such as an automatic weather forecast system for radio, automatic sports commentary system, and slow and easy-to-understand commentary news using our automated TTS training system based on broadcast commentary.