Presentation | 2020-10-22 [Invited Talk] NHK's activities on Japanese end-to-end speech synthesis Kiyoshi Kurihara, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The main business of NHK (Japan Broadcasting Corporation) is the production and broadcasting of programs. Many programs are produced daily and a considerable amount of work goes into the production of speech content by many people including announcers, directors, and engineers. To support this work and to provide new speech services, we have been researching speech synthesis using Deep Neural Networks (DNNs). DNN speech synthesis requires a large amount of data for training purposes, so we are also involved in the research of end-to-end speech synthesis to reduce the cost of obtaining this training data and generate high-quality speech. To achieve end-to-end speech synthesis in the Japanese language, we adapted the sequence-to-sequence + attention system of speech synthesis (seq2seq speech synthesis), which has proven results in English, to Japanese and proposed a speech synthesis technique that takes character strings consisting of kana (phonetic) text and prosodic symbols as input based on JEITA IT-4006, symbols for Japanese Text-to-Speech Synthesizer. We also developed a technique that enables control of speaking style by adding tags that express speaking style to the input data of seq2seq speech synthesis. We are developing applications for a speech synthesis system that incorporates these techniques and studying their use in a variety of scenarios. This talk describes these NHK activities in speech synthesis and introduces NHK’s efforts in universal services now being researched and developed at NHK Science & Technology Research Laboratories. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Statistical parametric speech synthesis / End-to-end speech synthesis / Speaking Style / Encoder-Decoder model |
Paper # | SP2020-11,WIT2020-12 |
Date of Issue | 2020-10-15 (SP, WIT) |
Conference Information | |
Committee | WIT / SP / IPSJ-SLP |
---|---|
Conference Date | 2020/10/22(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Daisuke Wakatsuki(Tsukuba Univ. of Tech.) / Hisashi Kawai(NICT) / 北岡 教英(豊技大) |
Vice Chair | Shinji Sakou(Nagoya Inst. of Tech.) |
Secretary | Shinji Sakou(Saitama Industrial Tech. Center) / (Teikyo Univ.) / (Univ. of Tokyo) |
Assistant | Manabi Miyagi(Tsukuba Univ. of Tech.) / Minako Hosono(AIST) / Aki Sugano(Nagoya Univ.) / Yusuke Ijima(NTT) |
Paper Information | |
Registration To | Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Invited Talk] NHK's activities on Japanese end-to-end speech synthesis |
Sub Title (in English) | |
Keyword(1) | Statistical parametric speech synthesis |
Keyword(2) | End-to-end speech synthesis |
Keyword(3) | Speaking Style |
Keyword(4) | Encoder-Decoder model |
1st Author's Name | Kiyoshi Kurihara |
1st Author's Affiliation | NHK (Japan Broadcasting Corporation)(NHK) |
Date | 2020-10-22 |
Paper # | SP2020-11,WIT2020-12 |
Volume (vol) | vol.120 |
Number (no) | SP-197,WIT-198 |
Page | pp.pp.19-20(SP), pp.19-20(WIT), |
#Pages | 2 |
Date of Issue | 2020-10-15 (SP, WIT) |