Presentation | 2019-12-06 [Invited Talk] Progress and prospects of statistical speech synthesis Keiichi Tokuda, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The basic problem of statistical speech synthesis is quite simple: we have a speech database for training, i.e., a set of speech waveforms and corresponding texts; given a text not included in the training data, what is the speech waveform corresponding to the text? The whole text-to-speech generation process is decomposed into feasible subproblems: usually, text analysis, acoustic modeling, and waveform generation, combined as a statistical generative model. Each submodule can be modeled by a statistical machine learning technique: first, hidden Markov models were applied to acoustic modeling module and then various types of deep neural networks (DNN) have been applied to not only acoustic modeling module but also other modules. I will give an overview of such statistical approaches to speech synthesis, looking back on the evolution in the last couple of decades. Recent DNN-based approaches drastically improved the speech quality, causing a paradigm shift from concatenative speech synthesis approach to generative model-based statistical approach. However, for realizing human-like talking machines, the goal is not only to generate natural-sounding speech but also to flexibly control variations in speech, such as speaker identities, speaking styles, emotional expressions, etc. This talk will also discuss such future challenges and the direction in speech synthesis research. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | statistical speech synthesis / deep learning / generative model / deep neural network |
Paper # | SP2019-35 |
Date of Issue | 2019-11-29 (SP) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2019/12/4(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | NHK Science & Technology Research Labs. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium |
Chair | Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT) |
Vice Chair | Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.) |
Assistant | Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Invited Talk] Progress and prospects of statistical speech synthesis |
Sub Title (in English) | |
Keyword(1) | statistical speech synthesis |
Keyword(2) | deep learning |
Keyword(3) | generative model |
Keyword(4) | deep neural network |
1st Author's Name | Keiichi Tokuda |
1st Author's Affiliation | Nagoya Institute of Technology(Nagoya Inst. of Tech.) |
Date | 2019-12-06 |
Paper # | SP2019-35 |
Volume (vol) | vol.119 |
Number (no) | SP-321 |
Page | pp.pp.11-12(SP), |
#Pages | 2 |
Date of Issue | 2019-11-29 (SP) |