Presentation 2023-10-14
Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods
Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Although pretraining and fine-tuning approaches have proven to work well in speech intelligibility enhancement, various mismatches, such as the speech type mismatch or speaker mismatches between the datasets used in each stage, can deteriorate the conversion performance of this framework. We propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch. Such a framework makes it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Intelligibility enhancement / Electrolaryngeal speech / Atypical speech
Paper # SP2023-33,WIT2023-24
Date of Issue 2023-10-07 (SP, WIT)

Conference Information
Committee WIT / SP / IPSJ-SLP
Conference Date 2023/10/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English) Kyushu Institute of Technology
Topics (in Japanese) (See Japanese page)
Topics (in English) Speech and Well-being Information Technology, etc.
Chair Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.)
Vice Chair Shinji Sakou(Nagoya Inst. of Tech.)
Secretary Shinji Sakou(AIST) / (Univ. of Toyama) / (Tsukuba Univ. of Tech.)
Assistant Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods
Sub Title (in English)
Keyword(1) Intelligibility enhancement
Keyword(2) Electrolaryngeal speech
Keyword(3) Atypical speech
1st Author's Name Lester Phillip Violeta
1st Author's Affiliation Nagoya University(Nagoya Univ.)
2nd Author's Name Wen-Chin Huang
2nd Author's Affiliation Nagoya University(Nagoya Univ.)
3rd Author's Name Ding Ma
3rd Author's Affiliation Nagoya University(Nagoya Univ.)
4th Author's Name Ryuichi Yamamoto
4th Author's Affiliation Nagoya University(Nagoya Univ.)
5th Author's Name Kazuhiro Kobayashi
5th Author's Affiliation Nagoya University(Nagoya Univ.)
6th Author's Name Tomoki Toda
6th Author's Affiliation Nagoya University(Nagoya Univ.)
Date 2023-10-14
Paper # SP2023-33,WIT2023-24
Volume (vol) vol.123
Number (no) SP-212,WIT-213
Page pp.pp.33-38(SP), pp.33-38(WIT),
#Pages 6
Date of Issue 2023-10-07 (SP, WIT)