Presentation 2019-12-06
A comparison of neural vocoders in singing voice synthesis
Sota Wada, Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this study, we compare five types of vocoders based on neural networks (neural vocoders) for singing voice synthesis. In recent years, WaveNet vocoder has been proposed as a neural vocoder. WaveNet vocoder can model speech waveforms with high accuracy and generate natural sounding speech. However there is a problem that WaveNet vocoder cannot synthesize speech in real time due to its autoregressive structure. To address this problem, two approaches have been proposed. The first approach is to reduce the model structure of the autoregressive models. This increases the efficiency of sampling from the models and allows faster synthesis than real time. The second approach is to synthesize multiple samples simultaneously by using flow-based generative models.The performance of these methods has been investigated using normal utterances, and no singing voice has been used yet. Therefore, in this paper, we compare the performance of five types of neural vocoders for singing voice synthesis. The results of subjective and objective evaluation experiments show that WaveRNN is an appropriate neural vocoder when emphasizing naturalness, and WaveNet is appropriate if emphasizing reproducibility of pitch and vibrato.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) DNN / Singing voice synthesis / Neural vocoder / WaveNet
Paper # SP2019-42
Date of Issue 2019-11-29 (SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2019/12/4(3days)
Place (in Japanese) (See Japanese page)
Place (in English) NHK Science & Technology Research Labs.
Topics (in Japanese) (See Japanese page)
Topics (in English) The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A comparison of neural vocoders in singing voice synthesis
Sub Title (in English)
Keyword(1) DNN
Keyword(2) Singing voice synthesis
Keyword(3) Neural vocoder
Keyword(4) WaveNet
1st Author's Name Sota Wada
1st Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
2nd Author's Name Yukiya Hono
2nd Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
3rd Author's Name Shinji Takaki
3rd Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
4th Author's Name Kei Hashimoto
4th Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
5th Author's Name Keiichiro Oura
5th Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
6th Author's Name Yoshihiko Nankaku
6th Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
7th Author's Name Keiichi Tokuda
7th Author's Affiliation Nagoya Institute of Technology(Nagoya Inst. of Tech.)
Date 2019-12-06
Paper # SP2019-42
Volume (vol) vol.119
Number (no) SP-321
Page pp.pp.85-90(SP),
#Pages 6
Date of Issue 2019-11-29 (SP)