［ポスター講演］ラベリング作業を必要としないsequence-to-sequence音響特徴量推定手法の有効性

栗原 清; 清山 信正; 熊野 正

Presentation	2019-12-06 [Poster Presentation] Effectiveness of sequence-to-sequence acoustic modeling by using automatic generated labels Kiyoshi Kurihara, Nobumasa Seiyama, Tadashi Kumano,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	We have proposed a method that uses yomigana (Japanese character readings) and prosodic symbols as input for sequence-to-sequence acoustic modeling in end-to-end speech synthesis. Sequence-to-sequence acoustic modeling can associate language features with acoustic features automatically, making it unnecessary to perform expensive phoneme segmentation work. In this paper, we propose a method for generating synthesized speech by automatically generating yomigana and prosodic symbols from sentences containing a mixture of kanji and kana characters. This method uses the front-end of Open JTalk (an open-source Japanese speech synthesis system) to obtain full-context labels from a kanji-kana mixed sentence, and then automatically converts these full-context labels into yomigana and prosodic symbols. By automating the learning units from the front-end to the sequence-to-sequence acoustic modeling , we can achieve high-quality speech synthesis learning from kanji-kana mixed sentences and speech files without incurring any additional cost. In subjective evaluation tests, we were able to confirm the efficacy of the proposed method. With our proposed method, it is possible to achieve speech synthesis of consistent quality based on yomigana and prosodic symbols obtained by automatically converting kanji-kana mixed text. This results in a considerable cost merit when implementing speech synthesis from newly recorded speech.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Statistical parametric speech synthesis / End-to-end speech synthesis / Prosodic symbols / Sequence-to-sequence model
Paper #	SP2019-37
Date of Issue	2019-11-29 (SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2019/12/4(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	NHK Science & Technology Research Labs.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair	Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair	Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant	Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Poster Presentation] Effectiveness of sequence-to-sequence acoustic modeling by using automatic generated labels
Sub Title (in English)
Keyword(1)	Statistical parametric speech synthesis
Keyword(2)	End-to-end speech synthesis
Keyword(3)	Prosodic symbols
Keyword(4)	Sequence-to-sequence model
1st Author's Name	Kiyoshi Kurihara
1st Author's Affiliation	NHK STRL(NHK)
2nd Author's Name	Nobumasa Seiyama
2nd Author's Affiliation	NHK STRL(NHK)
3rd Author's Name	Tadashi Kumano
3rd Author's Affiliation	NHK STRL(NHK)
Date	2019-12-06
Paper #	SP2019-37
Volume (vol)	vol.119
Number (no)	SP-321
Page	pp.pp.49-54(SP),
#Pages	6
Date of Issue	2019-11-29 (SP)