Presentation 2016-12-20
[Poster Presentation] F0 control by modeling differential features in DNN-based speech synthesis
Shuhei Yamada, Takashi Nose, Akinori Ito,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We have been developing ``tailor-made speech synthesis,'' a framework which enables users to modify synthetic speech naturally and intuitively. Previously, we proposed an F0 control technique by the F0 context in DNN-based speech synthesis. F0 context represents relative log F0 of training data at the segment (e.g. mora or accent phrase) level. The technique allows users to control relatively the log F0 of synthetic speech by the context. However, when users synthesize speech without F0 control, there is a problem that the naturalness of the synthetic speech degrades compared to that with a standard DNN-based synthesis. In this paper, we use another DNN that models the relationship between context including F0 context and differential features. Differential features represent the difference between acoustic features of the natural speech and the synthetic speech. The experiments showed that when we created F0 context appropriately in proposed method, reproductivity of log F0 improved compared to the conventional method. In this paper, we show that proposed technique enables to synthesize speech more naturally than standard DNN-based speech synthesis and to control F0 flexibly and naturally at the segment level.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) DNN-based speech synthesis / Model training / F0 control / F0 context / Differential feature
Paper # SP2016-55
Date of Issue 2016-12-13 (SP)

Conference Information
Committee SP / IPSJ-SLP / NLC / IPSJ-NL
Conference Date 2016/12/20(3days)
Place (in Japanese) (See Japanese page)
Place (in English) NTT Musashino R&D
Topics (in Japanese) (See Japanese page)
Topics (in English) The 18th Spoken Language Symposium & The Third Natural Language Processing Symposium
Chair Kazunori Mano(Shibaura Inst. of Tech.) / Nobuaki Minematsu(Univ. Tokyo) / Hiroshi Kanayama(IBM) / Kentaro Inui(Tohoku Univ.)
Vice Chair Hiroki Mori(Utsunomiya Univ.) / / Makoto Ichise(NTT DoCoMo) / Takeshi Sakaki(Univ. of Tokyo/Hottolink)
Secretary Hiroki Mori(Kobe Univ.) / (Shizuoka Univ.) / Makoto Ichise(Kyoyo Univ.) / Takeshi Sakaki(Toshiba) / (Tokyo Institute of Technology)
Assistant Taichi Asami(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / / Ryuichiro Higashinaka(NTT) / Mitsuo Yoshida(Toyohashi Univ. of Tech.)

Paper Information
Registration To Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Poster Presentation] F0 control by modeling differential features in DNN-based speech synthesis
Sub Title (in English)
Keyword(1) DNN-based speech synthesis
Keyword(2) Model training
Keyword(3) F0 control
Keyword(4) F0 context
Keyword(5) Differential feature
1st Author's Name Shuhei Yamada
1st Author's Affiliation Tohoku University(Tohoku Univ.)
2nd Author's Name Takashi Nose
2nd Author's Affiliation Tohoku University(Tohoku Univ.)
3rd Author's Name Akinori Ito
3rd Author's Affiliation Tohoku University(Tohoku Univ.)
Date 2016-12-20
Paper # SP2016-55
Volume (vol) vol.116
Number (no) SP-378
Page pp.pp.37-42(SP),
#Pages 6
Date of Issue 2016-12-13 (SP)