Presentation 2023-10-14
Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques
Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to normal speech converison (EL2SP). However, applying seq2seq VC on EL2SP faces challenges due to the need for vast and high-quality parallel training data. On the other hand, it is difficult to conduct effective transfer learning from normal speech to EL2SP dataset, owing to the substantial differences in acoustic features between EL and normal speech domains. To address above problems, we present novel methods with the multi-stage pretraining and fine-tuning techniques. We first employ encoder adaptation training for a pretrained seq2seq model with imperfect synthetic and original EL data. Then, we incorporate both imperfect synthetic and original parallel data for VC training. The final fine-tuning of EL2SP only uses the original dataset. The experimental results demonstrate that our approach yields significant improvements in EL2SP performance.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Sequence-to-sequence / voice conversion / electrolaryngeal speech / pretraining / fine-tuning
Paper # SP2023-32,WIT2023-23
Date of Issue 2023-10-07 (SP, WIT)

Conference Information
Committee WIT / SP / IPSJ-SLP
Conference Date 2023/10/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English) Kyushu Institute of Technology
Topics (in Japanese) (See Japanese page)
Topics (in English) Speech and Well-being Information Technology, etc.
Chair Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.)
Vice Chair Shinji Sakou(Nagoya Inst. of Tech.)
Secretary Shinji Sakou(AIST) / (Univ. of Toyama) / (Tsukuba Univ. of Tech.)
Assistant Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques
Sub Title (in English)
Keyword(1) Sequence-to-sequence
Keyword(2) voice conversion
Keyword(3) electrolaryngeal speech
Keyword(4) pretraining
Keyword(5) fine-tuning
1st Author's Name Ding Ma
1st Author's Affiliation Nagoya University(Nagoya Univ.)
2nd Author's Name Lester Phillip Violeta
2nd Author's Affiliation Nagoya University(Nagoya Univ.)
3rd Author's Name Kazuhiro Kobayashi
3rd Author's Affiliation Nagoya University(Nagoya Univ.)
4th Author's Name Tomoki Toda
4th Author's Affiliation Nagoya University(Nagoya Univ.)
Date 2023-10-14
Paper # SP2023-32,WIT2023-23
Volume (vol) vol.123
Number (no) SP-212,WIT-213
Page pp.pp.27-32(SP), pp.27-32(WIT),
#Pages 6
Date of Issue 2023-10-07 (SP, WIT)