歌声合成におけるニューラルボコーダの比較検討

和田 蒼汰; 法野 行哉; 高木 信二; 橋本 佳; 大浦 圭一郎; 南角 吉彦; 徳田 恵一

Presentation	2019-12-06 A comparison of neural vocoders in singing voice synthesis Sota Wada, Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this study, we compare five types of vocoders based on neural networks (neural vocoders) for singing voice synthesis. In recent years, WaveNet vocoder has been proposed as a neural vocoder. WaveNet vocoder can model speech waveforms with high accuracy and generate natural sounding speech. However there is a problem that WaveNet vocoder cannot synthesize speech in real time due to its autoregressive structure. To address this problem, two approaches have been proposed. The first approach is to reduce the model structure of the autoregressive models. This increases the efficiency of sampling from the models and allows faster synthesis than real time. The second approach is to synthesize multiple samples simultaneously by using flow-based generative models.The performance of these methods has been investigated using normal utterances, and no singing voice has been used yet. Therefore, in this paper, we compare the performance of five types of neural vocoders for singing voice synthesis. The results of subjective and objective evaluation experiments show that WaveRNN is an appropriate neural vocoder when emphasizing naturalness, and WaveNet is appropriate if emphasizing reproducibility of pitch and vibrato.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	DNN / Singing voice synthesis / Neural vocoder / WaveNet
Paper #	SP2019-42
Date of Issue	2019-11-29 (SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2019/12/4(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	NHK Science & Technology Research Labs.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair	Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair	Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant	Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A comparison of neural vocoders in singing voice synthesis
Sub Title (in English)
Keyword(1)	DNN
Keyword(2)	Singing voice synthesis
Keyword(3)	Neural vocoder
Keyword(4)	WaveNet
1st Author's Name	Sota Wada
1st Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
2nd Author's Name	Yukiya Hono
2nd Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
3rd Author's Name	Shinji Takaki
3rd Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
4th Author's Name	Kei Hashimoto
4th Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
5th Author's Name	Keiichiro Oura
5th Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
6th Author's Name	Yoshihiko Nankaku
6th Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
7th Author's Name	Keiichi Tokuda
7th Author's Affiliation	Nagoya Institute of Technology(Nagoya Inst. of Tech.)
Date	2019-12-06
Paper #	SP2019-42
Volume (vol)	vol.119
Number (no)	SP-321
Page	pp.pp.85-90(SP),
#Pages	6
Date of Issue	2019-11-29 (SP)