［招待講演］統計的音声合成の進展と展望

徳田 恵一

Presentation	2019-12-06 [Invited Talk] Progress and prospects of statistical speech synthesis Keiichi Tokuda,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	The basic problem of statistical speech synthesis is quite simple: we have a speech database for training, i.e., a set of speech waveforms and corresponding texts; given a text not included in the training data, what is the speech waveform corresponding to the text? The whole text-to-speech generation process is decomposed into feasible subproblems: usually, text analysis, acoustic modeling, and waveform generation, combined as a statistical generative model. Each submodule can be modeled by a statistical machine learning technique: first, hidden Markov models were applied to acoustic modeling module and then various types of deep neural networks (DNN) have been applied to not only acoustic modeling module but also other modules. I will give an overview of such statistical approaches to speech synthesis, looking back on the evolution in the last couple of decades. Recent DNN-based approaches drastically improved the speech quality, causing a paradigm shift from concatenative speech synthesis approach to generative model-based statistical approach. However, for realizing human-like talking machines, the goal is not only to generate natural-sounding speech but also to flexibly control variations in speech, such as speaker identities, speaking styles, emotional expressions, etc. This talk will also discuss such future challenges and the direction in speech synthesis research.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	statistical speech synthesis / deep learning / generative model / deep neural network
Paper #	SP2019-35
Date of Issue	2019-11-29 (SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2019/12/4(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	NHK Science & Technology Research Labs.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair	Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair	Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant	Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Invited Talk] Progress and prospects of statistical speech synthesis
Sub Title (in English)
Keyword(1)	statistical speech synthesis
Keyword(2)	deep learning
Keyword(3)	generative model
Keyword(4)	deep neural network
1st Author's Name	Keiichi Tokuda
1st Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
Date	2019-12-06
Paper #	SP2019-35
Volume (vol)	vol.119
Number (no)	SP-321
Page	pp.pp.11-12(SP),
#Pages	2
Date of Issue	2019-11-29 (SP)