Presentation 2023-03-03
Parallel-Data-Free Japanese Singer Conversion using CycleGAN Considering Perceptual Loss in Singing Phoneme Sequences
Kanade Gemmoto, Nobutaka Shimada, Tadashi Matsuo,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a one-to-one Japanese Singing Voice Conversion (SVC) method without using parallel data. Our method improves naturalness of singing by introducing the sung phoneme sequence perceptual loss, utilizing a speech recognition model in CycleGAN-based spectrogram conversion. In addition to incorporating the Adaptive Multi Adversarial Training(AMAT) framework, which prevents mode collapse, we demonstrate that singer conversion can be performed using a limited amount of Japanese singing data by controlling the adversarial training switch based on the accuracy of the Discriminator.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Singing Voice Conversion / Non-parallel Data / Perceptual Loss / CycleGAN / Spectrogram / AMAT / MelGAN
Paper # PRMU2022-114,IBISML2022-121
Date of Issue 2023-02-23 (PRMU, IBISML)

Conference Information
Committee PRMU / IBISML / IPSJ-CVIM
Conference Date 2023/3/2(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Future University Hakodate
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Seiichi Uchida(Kyushu Univ.) / Masashi Sugiyama(Univ. of Tokyo)
Vice Chair Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.) / Toshihiro Kamishima(AIST) / Koji Tsuda(Univ. of Tokyo)
Secretary Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo) / Toshihiro Kamishima(NTT) / Koji Tsuda(Hokkaido Univ.)
Assistant Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken) / Yoshinobu Kawahara(Osaka Univ.) / Taiji Suzuki(Tokyo Inst. of Tech.)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Information-Based Induction Sciences and Machine Learning / Special Interest Group on Computer Vision and Image Media
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Parallel-Data-Free Japanese Singer Conversion using CycleGAN Considering Perceptual Loss in Singing Phoneme Sequences
Sub Title (in English)
Keyword(1) Singing Voice Conversion
Keyword(2) Non-parallel Data
Keyword(3) Perceptual Loss
Keyword(4) CycleGAN
Keyword(5) Spectrogram
Keyword(6) AMAT
Keyword(7) MelGAN
1st Author's Name Kanade Gemmoto
1st Author's Affiliation Ritsumeikan University(Ritsumeikan Univ)
2nd Author's Name Nobutaka Shimada
2nd Author's Affiliation Ritsumeikan University(Ritsumeikan Univ)
3rd Author's Name Tadashi Matsuo
3rd Author's Affiliation Ritsumeikan University(Ritsumeikan Univ)
Date 2023-03-03
Paper # PRMU2022-114,IBISML2022-121
Volume (vol) vol.122
Number (no) PRMU-404,IBISML-405
Page pp.pp.293-298(PRMU), pp.293-298(IBISML),
#Pages 6
Date of Issue 2023-02-23 (PRMU, IBISML)