［ポスター講演］DNN音声合成における差分特徴量のモデル化を利用したF0制御

山田 修平; 能勢 隆; 伊藤 彰則

Presentation	2016-12-20 [Poster Presentation] F0 control by modeling differential features in DNN-based speech synthesis Shuhei Yamada, Takashi Nose, Akinori Ito,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	We have been developing ``tailor-made speech synthesis,'' a framework which enables users to modify synthetic speech naturally and intuitively. Previously, we proposed an F0 control technique by the F0 context in DNN-based speech synthesis. F0 context represents relative log F0 of training data at the segment (e.g. mora or accent phrase) level. The technique allows users to control relatively the log F0 of synthetic speech by the context. However, when users synthesize speech without F0 control, there is a problem that the naturalness of the synthetic speech degrades compared to that with a standard DNN-based synthesis. In this paper, we use another DNN that models the relationship between context including F0 context and differential features. Differential features represent the difference between acoustic features of the natural speech and the synthetic speech. The experiments showed that when we created F0 context appropriately in proposed method, reproductivity of log F0 improved compared to the conventional method. In this paper, we show that proposed technique enables to synthesize speech more naturally than standard DNN-based speech synthesis and to control F0 flexibly and naturally at the segment level.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	DNN-based speech synthesis / Model training / F0 control / F0 context / Differential feature
Paper #	SP2016-55
Date of Issue	2016-12-13 (SP)

Conference Information
Committee	SP / IPSJ-SLP / NLC / IPSJ-NL
Conference Date	2016/12/20(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	NTT Musashino R&D
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The 18th Spoken Language Symposium & The Third Natural Language Processing Symposium
Chair	Kazunori Mano(Shibaura Inst. of Tech.) / Nobuaki Minematsu(Univ. Tokyo) / Hiroshi Kanayama(IBM) / Kentaro Inui(Tohoku Univ.)
Vice Chair	Hiroki Mori(Utsunomiya Univ.) / / Makoto Ichise(NTT DoCoMo) / Takeshi Sakaki(Univ. of Tokyo/Hottolink)
Secretary	Hiroki Mori(Kobe Univ.) / (Shizuoka Univ.) / Makoto Ichise(Kyoyo Univ.) / Takeshi Sakaki(Toshiba) / (Tokyo Institute of Technology)
Assistant	Taichi Asami(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / / Ryuichiro Higashinaka(NTT) / Mitsuo Yoshida(Toyohashi Univ. of Tech.)

Paper Information
Registration To	Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Poster Presentation] F0 control by modeling differential features in DNN-based speech synthesis
Sub Title (in English)
Keyword(1)	DNN-based speech synthesis
Keyword(2)	Model training
Keyword(3)	F0 control
Keyword(4)	F0 context
Keyword(5)	Differential feature
1st Author's Name	Shuhei Yamada
1st Author's Affiliation	Tohoku University(Tohoku Univ.)
2nd Author's Name	Takashi Nose
2nd Author's Affiliation	Tohoku University(Tohoku Univ.)
3rd Author's Name	Akinori Ito
3rd Author's Affiliation	Tohoku University(Tohoku Univ.)
Date	2016-12-20
Paper #	SP2016-55
Volume (vol)	vol.116
Number (no)	SP-378
Page	pp.pp.37-42(SP),
#Pages	6
Date of Issue	2016-12-13 (SP)