Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods

Lester Phillip Violeta; Wen-Chin Huang; Ding Ma; Ryuichi Yamamoto; Kazuhiro Kobayashi; Tomoki Toda

講演名	2023-10-14 Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods Lester Phillip Violeta(名大), Wen-Chin Huang(名大), Ding Ma(名大), Ryuichi Yamamoto(名大), Kazuhiro Kobayashi(名大), Tomoki Toda(名大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	Although pretraining and fine-tuning approaches have proven to work well in speech intelligibility enhancement, various mismatches, such as the speech type mismatch or speaker mismatches between the datasets used in each stage, can deteriorate the conversion performance of this framework. We propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch. Such a framework makes it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score.
抄録(英)	Although pretraining and fine-tuning approaches have proven to work well in speech intelligibility enhancement, various mismatches, such as the speech type mismatch or speaker mismatches between the datasets used in each stage, can deteriorate the conversion performance of this framework. We propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch. Such a framework makes it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score.
キーワード(和)	Intelligibility enhancement / Electrolaryngeal speech / Atypical speech
キーワード(英)	Intelligibility enhancement / Electrolaryngeal speech / Atypical speech
資料番号	SP2023-33,WIT2023-24
発行日	2023-10-07 (SP, WIT)

研究会情報
研究会	WIT / SP / IPSJ-SLP
開催期間	2023/10/14(から1日開催)
開催地（和）	九州工業大学（戸畑キャンパス）
開催地（英）	Kyushu Institute of Technology
テーマ（和）	音声と福祉情報工学，一般
テーマ（英）	Speech and Well-being Information Technology, etc.
委員長氏名（和）	塩野目剛亮(帝京大) / 戸田智基(名大) / 戸田智基(名大)
委員長氏名（英）	Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.)
副委員長氏名（和）	酒向慎司(名工大)
副委員長氏名（英）	Shinji Sakou(Nagoya Inst. of Tech.)
幹事氏名（和）	細野美奈子(産総研) / 菅野亜紀(富山大) / 宮城愛美(筑波技術大) / 安藤厚志(NTT) / 橋本佳(名工大) / 安藤厚志(NTT) / 橋本佳(名工大) / 相原龍(三菱電機) / 齋藤大輔(東大)
幹事氏名（英）	Minako Hosono(AIST) / Aki Sugano(Univ. of Toyama) / Manabi Miyagi(Tsukuba Univ. of Tech.) / Atsushi Ando(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Atsushi Ando(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(UTokyo)
幹事補佐氏名（和）	内田翼(NHK) / 三浦哲平(豊田高専) / 相原龍(三菱電機) / 齋藤大輔(東大)
幹事補佐氏名（英）	Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

講演論文情報詳細
申込み研究会	Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
本文の言語	ENG
タイトル（和）
サブタイトル（和）
タイトル（英）	Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods
サブタイトル（和）
キーワード(1)（和/英）	Intelligibility enhancement / Intelligibility enhancement
キーワード(2)（和/英）	Electrolaryngeal speech / Electrolaryngeal speech
キーワード(3)（和/英）	Atypical speech / Atypical speech
第 1 著者氏名（和/英）	Lester Phillip Violeta / Lester Phillip Violeta
第 1 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
第 2 著者氏名（和/英）	Wen-Chin Huang / Wen-Chin Huang
第 2 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
第 3 著者氏名（和/英）	Ding Ma / Ding Ma
第 3 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
第 4 著者氏名（和/英）	Ryuichi Yamamoto / Ryuichi Yamamoto
第 4 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
第 5 著者氏名（和/英）	Kazuhiro Kobayashi / Kazuhiro Kobayashi
第 5 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
第 6 著者氏名（和/英）	Tomoki Toda / Tomoki Toda
第 6 著者所属（和/英）	Nagoya University(略称：名大) Nagoya University(略称：Nagoya Univ.)
発表年月日	2023-10-14
資料番号	SP2023-33,WIT2023-24
巻番号（vol）	vol.123
号番号（no）	SP-212,WIT-213
ページ範囲	pp.33-38(SP), pp.33-38(WIT),
ページ数	6
発行日	2023-10-07 (SP, WIT)