Presentation 2020-12-02
Multi-Modal Emotion Recognition by Integrating of Acoustic and Linguistic Features
Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In recent years, the advanced techique of deep learning has improved the performance of Speech Emotional Recognition as well as speech synthesis or speech recognition. Moreover, multi-mordal emotion recognition which integrates linguistic or facial image features with acoustic features has outperformed conventional methods as well. In this paper, we propose a method of SER by using acoustic and linguistic features at the utterance level. Firstly, speech and text emotion recognition are trained with Japanese emotional speech corpus. Then, we aim to improve accuracy by using early-fusion which fuses linguistic and acoustic features and late-fusion which fuses predicted values by each model.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech Emotion Recognition / Text Emotion Recognition / Multi-Modal / early-fusion / late-fusion
Paper # NLC2020-14,SP2020-17
Date of Issue 2020-11-25 (NLC, SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2020/12/2(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Kazutaka Shimada(Kyushu Inst. of Tech.) / 関根 聡(理研) / Hisashi Kawai(NICT) / 北岡 教英(豊技大)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Takeshi Kobayakawa(NHK)
Secretary Mitsuo Yoshida(Univ. of Tokyo) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (デンソーITラボ) / (小樽商科大) / (茨城大)
Assistant Kanjin Takahashi(Sansan) / Ko Mitsuda(NTT) / / Yusuke Ijima(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Multi-Modal Emotion Recognition by Integrating of Acoustic and Linguistic Features
Sub Title (in English)
Keyword(1) Speech Emotion Recognition
Keyword(2) Text Emotion Recognition
Keyword(3) Multi-Modal
Keyword(4) early-fusion
Keyword(5) late-fusion
1st Author's Name Ryotaro Nagase
1st Author's Affiliation Ritsumeikan University(Ritsumeikan Univ.)
2nd Author's Name Takahiro Fukumori
2nd Author's Affiliation Ritsumeikan University(Ritsumeikan Univ.)
3rd Author's Name Yoichi Yamashita
3rd Author's Affiliation Ritsumeikan University(Ritsumeikan Univ.)
Date 2020-12-02
Paper # NLC2020-14,SP2020-17
Volume (vol) vol.120
Number (no) NLC-270,SP-271
Page pp.pp.7-12(NLC), pp.7-12(SP),
#Pages 6
Date of Issue 2020-11-25 (NLC, SP)