Presentation | 2023-10-14 Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Although pretraining and fine-tuning approaches have proven to work well in speech intelligibility enhancement, various mismatches, such as the speech type mismatch or speaker mismatches between the datasets used in each stage, can deteriorate the conversion performance of this framework. We propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch. Such a framework makes it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Intelligibility enhancement / Electrolaryngeal speech / Atypical speech |
Paper # | SP2023-33,WIT2023-24 |
Date of Issue | 2023-10-07 (SP, WIT) |
Conference Information | |
Committee | WIT / SP / IPSJ-SLP |
---|---|
Conference Date | 2023/10/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kyushu Institute of Technology |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech and Well-being Information Technology, etc. |
Chair | Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) |
Vice Chair | Shinji Sakou(Nagoya Inst. of Tech.) |
Secretary | Shinji Sakou(AIST) / (Univ. of Toyama) / (Tsukuba Univ. of Tech.) |
Assistant | Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods |
Sub Title (in English) | |
Keyword(1) | Intelligibility enhancement |
Keyword(2) | Electrolaryngeal speech |
Keyword(3) | Atypical speech |
1st Author's Name | Lester Phillip Violeta |
1st Author's Affiliation | Nagoya University(Nagoya Univ.) |
2nd Author's Name | Wen-Chin Huang |
2nd Author's Affiliation | Nagoya University(Nagoya Univ.) |
3rd Author's Name | Ding Ma |
3rd Author's Affiliation | Nagoya University(Nagoya Univ.) |
4th Author's Name | Ryuichi Yamamoto |
4th Author's Affiliation | Nagoya University(Nagoya Univ.) |
5th Author's Name | Kazuhiro Kobayashi |
5th Author's Affiliation | Nagoya University(Nagoya Univ.) |
6th Author's Name | Tomoki Toda |
6th Author's Affiliation | Nagoya University(Nagoya Univ.) |
Date | 2023-10-14 |
Paper # | SP2023-33,WIT2023-24 |
Volume (vol) | vol.123 |
Number (no) | SP-212,WIT-213 |
Page | pp.pp.33-38(SP), pp.33-38(WIT), |
#Pages | 6 |
Date of Issue | 2023-10-07 (SP, WIT) |