［招待講演］日本語End-to-End 音声合成に対するNHKの取り組み

栗原 清

Presentation	2020-10-22 [Invited Talk] NHK's activities on Japanese end-to-end speech synthesis Kiyoshi Kurihara,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	The main business of NHK (Japan Broadcasting Corporation) is the production and broadcasting of programs. Many programs are produced daily and a considerable amount of work goes into the production of speech content by many people including announcers, directors, and engineers. To support this work and to provide new speech services, we have been researching speech synthesis using Deep Neural Networks (DNNs). DNN speech synthesis requires a large amount of data for training purposes, so we are also involved in the research of end-to-end speech synthesis to reduce the cost of obtaining this training data and generate high-quality speech. To achieve end-to-end speech synthesis in the Japanese language, we adapted the sequence-to-sequence + attention system of speech synthesis (seq2seq speech synthesis), which has proven results in English, to Japanese and proposed a speech synthesis technique that takes character strings consisting of kana (phonetic) text and prosodic symbols as input based on JEITA IT-4006, symbols for Japanese Text-to-Speech Synthesizer. We also developed a technique that enables control of speaking style by adding tags that express speaking style to the input data of seq2seq speech synthesis. We are developing applications for a speech synthesis system that incorporates these techniques and studying their use in a variety of scenarios. This talk describes these NHK activities in speech synthesis and introduces NHK’s efforts in universal services now being researched and developed at NHK Science & Technology Research Laboratories.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Statistical parametric speech synthesis / End-to-end speech synthesis / Speaking Style / Encoder-Decoder model
Paper #	SP2020-11,WIT2020-12
Date of Issue	2020-10-15 (SP, WIT)

Conference Information
Committee	WIT / SP / IPSJ-SLP
Conference Date	2020/10/22(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Daisuke Wakatsuki(Tsukuba Univ. of Tech.) / Hisashi Kawai(NICT) / 北岡教英(豊技大)
Vice Chair	Shinji Sakou(Nagoya Inst. of Tech.)
Secretary	Shinji Sakou(Saitama Industrial Tech. Center) / (Teikyo Univ.) / (Univ. of Tokyo)
Assistant	Manabi Miyagi(Tsukuba Univ. of Tech.) / Minako Hosono(AIST) / Aki Sugano(Nagoya Univ.) / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Invited Talk] NHK's activities on Japanese end-to-end speech synthesis
Sub Title (in English)
Keyword(1)	Statistical parametric speech synthesis
Keyword(2)	End-to-end speech synthesis
Keyword(3)	Speaking Style
Keyword(4)	Encoder-Decoder model
1st Author's Name	Kiyoshi Kurihara
1st Author's Affiliation	NHK (Japan Broadcasting Corporation)(NHK)
Date	2020-10-22
Paper #	SP2020-11,WIT2020-12
Volume (vol)	vol.120
Number (no)	SP-197,WIT-198
Page	pp.pp.19-20(SP), pp.19-20(WIT),
#Pages	2
Date of Issue	2020-10-15 (SP, WIT)