音声合成と音声認識に対するテキストデータを用いた半教師あり統合学習

Presentation	2022-11-30 Semi-supervised joint training of text to speech and automatic speech recognition using unpaired text data Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper presents a novel joint training of text to speech (TTS) and automatic speech recognition (ASR) with small amounts of speech-text paired data and large amounts of text data. In conventional cycle-consistency-based methods, the TTS model and the ASR model are trained so that the text obtained by speech synthesis of text data and speech recognition of the synthesized speech matches the original text. However, this method leads to an overfitting of the synthesized speech to the ASR model, which results in the synthesized speech that (1) lacks speaker characteristics and (2) is easily recognizable. This problem not only degrades the quality of the synthesized speech but also limits the improvement of speech recognition performance. In this paper, we propose a learning method based on (1) speaker consistency loss and (2) step-wise optimization to solve this problem. Experimental results demonstrate the efficacy of the proposed method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	automatic speech recognition / text to speech / semi-supervised learning
Paper #	NLC2022-14,SP2022-34
Date of Issue	2022-11-22 (NLC, SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2022/11/29(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Mitsuo Yoshida(Univ. of Tsukuba) / 須藤克仁(奈良先端科学技術大学院大学) / Tomoki Toda(Nagoya Univ.) / 戸田智基(名古屋大学)
Vice Chair	Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK)
Secretary	Hiroki Sakaji(NTT) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (株式会社デンソーアイティーラボラトリ) / (北海学園大学) / (東京農工大学)
Assistant	Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.) / / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Semi-supervised joint training of text to speech and automatic speech recognition using unpaired text data
Sub Title (in English)
Keyword(1)	automatic speech recognition
Keyword(2)	text to speech
Keyword(3)	semi-supervised learning
1st Author's Name	Naoki Makishima
1st Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
2nd Author's Name	Satoshi Suzuki
2nd Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
3rd Author's Name	Atsushi Ando
3rd Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
4th Author's Name	Ryo Masumura
4th Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
Date	2022-11-30
Paper #	NLC2022-14,SP2022-34
Volume (vol)	vol.122
Number (no)	NLC-287,SP-288
Page	pp.pp.27-32(NLC), pp.27-32(SP),
#Pages	6
Date of Issue	2022-11-22 (NLC, SP)