Presentation | 2023-10-14 Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to normal speech converison (EL2SP). However, applying seq2seq VC on EL2SP faces challenges due to the need for vast and high-quality parallel training data. On the other hand, it is difficult to conduct effective transfer learning from normal speech to EL2SP dataset, owing to the substantial differences in acoustic features between EL and normal speech domains. To address above problems, we present novel methods with the multi-stage pretraining and fine-tuning techniques. We first employ encoder adaptation training for a pretrained seq2seq model with imperfect synthetic and original EL data. Then, we incorporate both imperfect synthetic and original parallel data for VC training. The final fine-tuning of EL2SP only uses the original dataset. The experimental results demonstrate that our approach yields significant improvements in EL2SP performance. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Sequence-to-sequence / voice conversion / electrolaryngeal speech / pretraining / fine-tuning |
Paper # | SP2023-32,WIT2023-23 |
Date of Issue | 2023-10-07 (SP, WIT) |
Conference Information | |
Committee | WIT / SP / IPSJ-SLP |
---|---|
Conference Date | 2023/10/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kyushu Institute of Technology |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech and Well-being Information Technology, etc. |
Chair | Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) |
Vice Chair | Shinji Sakou(Nagoya Inst. of Tech.) |
Secretary | Shinji Sakou(AIST) / (Univ. of Toyama) / (Tsukuba Univ. of Tech.) |
Assistant | Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques |
Sub Title (in English) | |
Keyword(1) | Sequence-to-sequence |
Keyword(2) | voice conversion |
Keyword(3) | electrolaryngeal speech |
Keyword(4) | pretraining |
Keyword(5) | fine-tuning |
1st Author's Name | Ding Ma |
1st Author's Affiliation | Nagoya University(Nagoya Univ.) |
2nd Author's Name | Lester Phillip Violeta |
2nd Author's Affiliation | Nagoya University(Nagoya Univ.) |
3rd Author's Name | Kazuhiro Kobayashi |
3rd Author's Affiliation | Nagoya University(Nagoya Univ.) |
4th Author's Name | Tomoki Toda |
4th Author's Affiliation | Nagoya University(Nagoya Univ.) |
Date | 2023-10-14 |
Paper # | SP2023-32,WIT2023-23 |
Volume (vol) | vol.123 |
Number (no) | SP-212,WIT-213 |
Page | pp.pp.27-32(SP), pp.27-32(WIT), |
#Pages | 6 |
Date of Issue | 2023-10-07 (SP, WIT) |