Presentation 2019-12-06
[Poster Presentation] Effectiveness of sequence-to-sequence acoustic modeling by using automatic generated labels
Kiyoshi Kurihara, Nobumasa Seiyama, Tadashi Kumano,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We have proposed a method that uses yomigana (Japanese character readings) and prosodic symbols as input for sequence-to-sequence acoustic modeling in end-to-end speech synthesis. Sequence-to-sequence acoustic modeling can associate language features with acoustic features automatically, making it unnecessary to perform expensive phoneme segmentation work. In this paper, we propose a method for generating synthesized speech by automatically generating yomigana and prosodic symbols from sentences containing a mixture of kanji and kana characters. This method uses the front-end of Open JTalk (an open-source Japanese speech synthesis system) to obtain full-context labels from a kanji-kana mixed sentence, and then automatically converts these full-context labels into yomigana and prosodic symbols. By automating the learning units from the front-end to the sequence-to-sequence acoustic modeling , we can achieve high-quality speech synthesis learning from kanji-kana mixed sentences and speech files without incurring any additional cost. In subjective evaluation tests, we were able to confirm the efficacy of the proposed method. With our proposed method, it is possible to achieve speech synthesis of consistent quality based on yomigana and prosodic symbols obtained by automatically converting kanji-kana mixed text. This results in a considerable cost merit when implementing speech synthesis from newly recorded speech.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Statistical parametric speech synthesis / End-to-end speech synthesis / Prosodic symbols / Sequence-to-sequence model
Paper # SP2019-37
Date of Issue 2019-11-29 (SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2019/12/4(3days)
Place (in Japanese) (See Japanese page)
Place (in English) NHK Science & Technology Research Labs.
Topics (in Japanese) (See Japanese page)
Topics (in English) The 6th Natural Language Processing Symposium & The 21th Spoken Language Symposium
Chair Takeshi Sakaki(Hottolink) / / Hisashi Kawai(NICT)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Waseda Univ.)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Poster Presentation] Effectiveness of sequence-to-sequence acoustic modeling by using automatic generated labels
Sub Title (in English)
Keyword(1) Statistical parametric speech synthesis
Keyword(2) End-to-end speech synthesis
Keyword(3) Prosodic symbols
Keyword(4) Sequence-to-sequence model
1st Author's Name Kiyoshi Kurihara
1st Author's Affiliation NHK STRL(NHK)
2nd Author's Name Nobumasa Seiyama
2nd Author's Affiliation NHK STRL(NHK)
3rd Author's Name Tadashi Kumano
3rd Author's Affiliation NHK STRL(NHK)
Date 2019-12-06
Paper # SP2019-37
Volume (vol) vol.119
Number (no) SP-321
Page pp.pp.49-54(SP),
#Pages 6
Date of Issue 2019-11-29 (SP)