講演名 | 2023-10-14 Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques Ding Ma(名大), Lester Phillip Violeta(名大), Kazuhiro Kobayashi(名大), Tomoki Toda(名大), |
---|---|
PDFダウンロードページ | PDFダウンロードページへ |
抄録(和) | Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to normal speech converison (EL2SP). However, applying seq2seq VC on EL2SP faces challenges due to the need for vast and high-quality parallel training data. On the other hand, it is difficult to conduct effective transfer learning from normal speech to EL2SP dataset, owing to the substantial differences in acoustic features between EL and normal speech domains. To address above problems, we present novel methods with the multi-stage pretraining and fine-tuning techniques. We first employ encoder adaptation training for a pretrained seq2seq model with imperfect synthetic and original EL data. Then, we incorporate both imperfect synthetic and original parallel data for VC training. The final fine-tuning of EL2SP only uses the original dataset. The experimental results demonstrate that our approach yields significant improvements in EL2SP performance. |
抄録(英) | Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to normal speech converison (EL2SP). However, applying seq2seq VC on EL2SP faces challenges due to the need for vast and high-quality parallel training data. On the other hand, it is difficult to conduct effective transfer learning from normal speech to EL2SP dataset, owing to the substantial differences in acoustic features between EL and normal speech domains. To address above problems, we present novel methods with the multi-stage pretraining and fine-tuning techniques. We first employ encoder adaptation training for a pretrained seq2seq model with imperfect synthetic and original EL data. Then, we incorporate both imperfect synthetic and original parallel data for VC training. The final fine-tuning of EL2SP only uses the original dataset. The experimental results demonstrate that our approach yields significant improvements in EL2SP performance. |
キーワード(和) | Sequence-to-sequence / voice conversion / electrolaryngeal speech / pretraining / fine-tuning |
キーワード(英) | Sequence-to-sequence / voice conversion / electrolaryngeal speech / pretraining / fine-tuning |
資料番号 | SP2023-32,WIT2023-23 |
発行日 | 2023-10-07 (SP, WIT) |
研究会情報 | |
研究会 | WIT / SP / IPSJ-SLP |
---|---|
開催期間 | 2023/10/14(から1日開催) |
開催地(和) | 九州工業大学(戸畑キャンパス) |
開催地(英) | Kyushu Institute of Technology |
テーマ(和) | 音声と福祉情報工学,一般 |
テーマ(英) | Speech and Well-being Information Technology, etc. |
委員長氏名(和) | 塩野目 剛亮(帝京大) / 戸田 智基(名大) / 戸田 智基(名大) |
委員長氏名(英) | Takeaki Shionome(Teikyo Univ.) / Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) |
副委員長氏名(和) | 酒向 慎司(名工大) |
副委員長氏名(英) | Shinji Sakou(Nagoya Inst. of Tech.) |
幹事氏名(和) | 細野 美奈子(産総研) / 菅野 亜紀(富山大) / 宮城 愛美(筑波技術大) / 安藤 厚志(NTT) / 橋本 佳(名工大) / 安藤 厚志(NTT) / 橋本 佳(名工大) / 相原 龍(三菱電機) / 齋藤 大輔(東大) |
幹事氏名(英) | Minako Hosono(AIST) / Aki Sugano(Univ. of Toyama) / Manabi Miyagi(Tsukuba Univ. of Tech.) / Atsushi Ando(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Atsushi Ando(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(UTokyo) |
幹事補佐氏名(和) | 内田 翼(NHK) / 三浦 哲平(豊田高専) / 相原 龍(三菱電機) / 齋藤 大輔(東大) |
幹事補佐氏名(英) | Tsubasa Uchida(NHK) / Teppei Miura(National Inst. of Techn. Toyota College) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
講演論文情報詳細 | |
申込み研究会 | Technical Committee on Well-being Information Technology / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
本文の言語 | ENG |
タイトル(和) | |
サブタイトル(和) | |
タイトル(英) | Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques |
サブタイトル(和) | |
キーワード(1)(和/英) | Sequence-to-sequence / Sequence-to-sequence |
キーワード(2)(和/英) | voice conversion / voice conversion |
キーワード(3)(和/英) | electrolaryngeal speech / electrolaryngeal speech |
キーワード(4)(和/英) | pretraining / pretraining |
キーワード(5)(和/英) | fine-tuning / fine-tuning |
第 1 著者 氏名(和/英) | Ding Ma / Ding Ma |
第 1 著者 所属(和/英) | Nagoya University(略称:名大) Nagoya University(略称:Nagoya Univ.) |
第 2 著者 氏名(和/英) | Lester Phillip Violeta / Lester Phillip Violeta |
第 2 著者 所属(和/英) | Nagoya University(略称:名大) Nagoya University(略称:Nagoya Univ.) |
第 3 著者 氏名(和/英) | Kazuhiro Kobayashi / Kazuhiro Kobayashi |
第 3 著者 所属(和/英) | Nagoya University(略称:名大) Nagoya University(略称:Nagoya Univ.) |
第 4 著者 氏名(和/英) | Tomoki Toda / Tomoki Toda |
第 4 著者 所属(和/英) | Nagoya University(略称:名大) Nagoya University(略称:Nagoya Univ.) |
発表年月日 | 2023-10-14 |
資料番号 | SP2023-32,WIT2023-23 |
巻番号(vol) | vol.123 |
号番号(no) | SP-212,WIT-213 |
ページ範囲 | pp.27-32(SP), pp.27-32(WIT), |
ページ数 | 6 |
発行日 | 2023-10-07 (SP, WIT) |