Presentation 2023-12-03
[Poster Presentation] Self-supervised learning model based emotion transfer and intensity control technology for expressive speech synthesis
Wei Li, Nobuaki Minematsu, Daisuke Saito,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Emotion transfer techniques, which transfersba the speaking style from the reference speech to the target speech, are widely used for speech synthesis. However, previous methods using emotion classifier to disentangle the emotion components fail to transfer the correct emotion to the target speech in some contexts. To solve this problem, we introduce self-supervised learning model to improve the capability of emotion feature extraction. In addition, we utilize the relative attributes method to obtain the intensity labels for our emotional speech dataset. Experimental results indicate that our method can improve the performance of emotional speech synthesis model.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Emotion TransferIntensity ControlSelf-supervised Learning ModelSpeech Synthesis
Paper # NLC2023-21,SP2023-41
Date of Issue 2023-11-25 (NLC, SP)

Conference Information
Committee SP / NLC / IPSJ-SLP / IPSJ-NL
Conference Date 2023/12/2(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Kikai-Shinko-Kaikan Bldg.
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Tomoki Toda(Nagoya Univ.) / Mitsuo Yoshida(Univ. of Tsukuba) / 戸田 智基(名古屋大学) / 須藤 克仁(奈良先端科学技術大学院大学)
Vice Chair / Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK)
Secretary (NTT) / Hiroki Sakaji(Nagoya Inst. of Tech.) / Takeshi Kobayakawa(rinna) / (Hiroshima Univ. of Economics) / (NTT)
Assistant Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.)

Paper Information
Registration To Technical Committee on Speech / Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Spoken Language Processing / Special Interest Group on Natural Language
Language ENG-JTITLE
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Poster Presentation] Self-supervised learning model based emotion transfer and intensity control technology for expressive speech synthesis
Sub Title (in English)
Keyword(1) Emotion TransferIntensity ControlSelf-supervised Learning ModelSpeech Synthesis
1st Author's Name Wei Li
1st Author's Affiliation the University of Tokyo(Univ. of Tokyo)
2nd Author's Name Nobuaki Minematsu
2nd Author's Affiliation the University of Tokyo(Univ. of Tokyo)
3rd Author's Name Daisuke Saito
3rd Author's Affiliation the University of Tokyo(Univ. of Tokyo)
Date 2023-12-03
Paper # NLC2023-21,SP2023-41
Volume (vol) vol.123
Number (no) NLC-291,SP-292
Page pp.pp.43-48(NLC), pp.43-48(SP),
#Pages 6
Date of Issue 2023-11-25 (NLC, SP)