Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques

Ding Ma; Lester Phillip Violeta; Kazuhiro Kobayashi; Tomoki Toda

Presentation	2023-10-14 Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to normal speech converison (EL2SP). However, applying seq2seq VC on EL2SP faces challenges due to the need for vast and high-quality parallel training data. On the other hand, it is difficult to conduct effective transfer learning from normal speech to EL2SP dataset, owing to the substantial differences in acoustic features between EL and normal speech domains. To address above problems, we present novel methods with the multi-stage pretraining and fine-tuning techniques. We first employ encoder adaptation training for a pretrained seq2seq model with imperfect synthetic and original EL data. Then, we incorporate both imperfect synthetic and original parallel data for VC training. The final fine-tuning of EL2SP only uses the original dataset. The experimental results demonstrate that our approach yields significant improvements in EL2SP performance.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Sequence-to-sequence / voice conversion / electrolaryngeal speech / pretraining / fine-tuning
Paper #	SP2023-32,WIT2023-23
Date of Issue	2023-10-07 (SP, WIT)

Conference Information
Committee	WIT / SP / IPSJ-SLP
Conference Date	2023/10/14(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Kyushu Institute of Technology
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Speech and Well-being Information Technology, etc.
Chair	Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.)
Vice Chair	Shinji Sakou(Nagoya Inst. of Tech.)
Secretary	Shinji Sakou(AIST) / (Univ. of Toyama) / (Tsukuba Univ. of Tech.)
Assistant	Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To	Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques
Sub Title (in English)
Keyword(1)	Sequence-to-sequence
Keyword(2)	voice conversion
Keyword(3)	electrolaryngeal speech
Keyword(4)	pretraining
Keyword(5)	fine-tuning
1st Author's Name	Ding Ma
1st Author's Affiliation	Nagoya University(Nagoya Univ.)
2nd Author's Name	Lester Phillip Violeta
2nd Author's Affiliation	Nagoya University(Nagoya Univ.)
3rd Author's Name	Kazuhiro Kobayashi
3rd Author's Affiliation	Nagoya University(Nagoya Univ.)
4th Author's Name	Tomoki Toda
4th Author's Affiliation	Nagoya University(Nagoya Univ.)
Date	2023-10-14
Paper #	SP2023-32,WIT2023-23
Volume (vol)	vol.123
Number (no)	SP-212,WIT-213
Page	pp.pp.27-32(SP), pp.27-32(WIT),
#Pages	6
Date of Issue	2023-10-07 (SP, WIT)