Presentation | 2023-03-03 Parallel-Data-Free Japanese Singer Conversion using CycleGAN Considering Perceptual Loss in Singing Phoneme Sequences Kanade Gemmoto, Nobutaka Shimada, Tadashi Matsuo, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a one-to-one Japanese Singing Voice Conversion (SVC) method without using parallel data. Our method improves naturalness of singing by introducing the sung phoneme sequence perceptual loss, utilizing a speech recognition model in CycleGAN-based spectrogram conversion. In addition to incorporating the Adaptive Multi Adversarial Training(AMAT) framework, which prevents mode collapse, we demonstrate that singer conversion can be performed using a limited amount of Japanese singing data by controlling the adversarial training switch based on the accuracy of the Discriminator. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Singing Voice Conversion / Non-parallel Data / Perceptual Loss / CycleGAN / Spectrogram / AMAT / MelGAN |
Paper # | PRMU2022-114,IBISML2022-121 |
Date of Issue | 2023-02-23 (PRMU, IBISML) |
Conference Information | |
Committee | PRMU / IBISML / IPSJ-CVIM |
---|---|
Conference Date | 2023/3/2(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Future University Hakodate |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Seiichi Uchida(Kyushu Univ.) / Masashi Sugiyama(Univ. of Tokyo) |
Vice Chair | Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.) / Toshihiro Kamishima(AIST) / Koji Tsuda(Univ. of Tokyo) |
Secretary | Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo) / Toshihiro Kamishima(NTT) / Koji Tsuda(Hokkaido Univ.) |
Assistant | Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken) / Yoshinobu Kawahara(Osaka Univ.) / Taiji Suzuki(Tokyo Inst. of Tech.) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Information-Based Induction Sciences and Machine Learning / Special Interest Group on Computer Vision and Image Media |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Parallel-Data-Free Japanese Singer Conversion using CycleGAN Considering Perceptual Loss in Singing Phoneme Sequences |
Sub Title (in English) | |
Keyword(1) | Singing Voice Conversion |
Keyword(2) | Non-parallel Data |
Keyword(3) | Perceptual Loss |
Keyword(4) | CycleGAN |
Keyword(5) | Spectrogram |
Keyword(6) | AMAT |
Keyword(7) | MelGAN |
1st Author's Name | Kanade Gemmoto |
1st Author's Affiliation | Ritsumeikan University(Ritsumeikan Univ) |
2nd Author's Name | Nobutaka Shimada |
2nd Author's Affiliation | Ritsumeikan University(Ritsumeikan Univ) |
3rd Author's Name | Tadashi Matsuo |
3rd Author's Affiliation | Ritsumeikan University(Ritsumeikan Univ) |
Date | 2023-03-03 |
Paper # | PRMU2022-114,IBISML2022-121 |
Volume (vol) | vol.122 |
Number (no) | PRMU-404,IBISML-405 |
Page | pp.pp.293-298(PRMU), pp.293-298(IBISML), |
#Pages | 6 |
Date of Issue | 2023-02-23 (PRMU, IBISML) |