Presentation 2019-12-06
[Invited Talk] Progress and prospects of statistical speech synthesis
Keiichi Tokuda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The basic problem of statistical speech synthesis is quite simple: we have a speech database for training, i.e., a set of speech waveforms and corresponding texts; given a text not included in the training data, what is the speech waveform corresponding to the text? The whole text-to-speech generation process is decomposed into feasible subproblems: usually, text analysis, acoustic modeling, and waveform generation, combined as a statistical generative model. Each submodule can be modeled by a statistical machine learning technique: first, hidden Markov models were applied to acoustic modeling module and then various types of deep neural networks (DNN) have been applied to not only acoustic modeling module but also other modules. I will give an overview of such statistical approaches to speech synthesis, looking back on the evolution in the last couple of decades. Recent DNN-based approaches drastically improved the speech quality, causing a paradigm shift from concatenative speech synthesis approach to generative model-based statistical approach. However, for realizing human-like talking machines, the goal is not only to generate natural-sounding speech but also to flexibly control variations in speech, such as speaker identities, speaking styles, emotional expressions, etc. This talk will also discuss such future challenges and the direction in speech synthesis research.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) statistical speech synthesis / deep learning / generative model / deep neural network
Paper # SP2019-35
Date of Issue 2019-11-29 (SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2019/12/4(3days)
Place (in Japanese) (See Japanese page)
Place (in English) NHK Science & Technology Research Labs.
Topics (in Japanese) (See Japanese page)
Topics (in English) The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Invited Talk] Progress and prospects of statistical speech synthesis
Sub Title (in English)
Keyword(1) statistical speech synthesis
Keyword(2) deep learning
Keyword(3) generative model
Keyword(4) deep neural network
1st Author's Name Keiichi Tokuda
1st Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
Date 2019-12-06
Paper # SP2019-35
Volume (vol) vol.119
Number (no) SP-321
Page pp.pp.11-12(SP),
#Pages 2
Date of Issue 2019-11-29 (SP)