音声特徴とテキスト特徴の協調利用によるマルチモーダル感情認識

Presentation	2020-12-02 Multi-Modal Emotion Recognition by Integrating of Acoustic and Linguistic Features Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In recent years, the advanced techique of deep learning has improved the performance of Speech Emotional Recognition as well as speech synthesis or speech recognition. Moreover, multi-mordal emotion recognition which integrates linguistic or facial image features with acoustic features has outperformed conventional methods as well. In this paper, we propose a method of SER by using acoustic and linguistic features at the utterance level. Firstly, speech and text emotion recognition are trained with Japanese emotional speech corpus. Then, we aim to improve accuracy by using early-fusion which fuses linguistic and acoustic features and late-fusion which fuses predicted values by each model.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech Emotion Recognition / Text Emotion Recognition / Multi-Modal / early-fusion / late-fusion
Paper #	NLC2020-14,SP2020-17
Date of Issue	2020-11-25 (NLC, SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2020/12/2(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Kazutaka Shimada(Kyushu Inst. of Tech.) / 関根聡(理研) / Hisashi Kawai(NICT) / 北岡教英(豊技大)
Vice Chair	Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Takeshi Kobayakawa(NHK)
Secretary	Mitsuo Yoshida(Univ. of Tokyo) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (デンソーITラボ) / (小樽商科大) / (茨城大)
Assistant	Kanjin Takahashi(Sansan) / Ko Mitsuda(NTT) / / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Multi-Modal Emotion Recognition by Integrating of Acoustic and Linguistic Features
Sub Title (in English)
Keyword(1)	Speech Emotion Recognition
Keyword(2)	Text Emotion Recognition
Keyword(3)	Multi-Modal
Keyword(4)	early-fusion
Keyword(5)	late-fusion
1st Author's Name	Ryotaro Nagase
1st Author's Affiliation	Ritsumeikan University(Ritsumeikan Univ.)
2nd Author's Name	Takahiro Fukumori
2nd Author's Affiliation	Ritsumeikan University(Ritsumeikan Univ.)
3rd Author's Name	Yoichi Yamashita
3rd Author's Affiliation	Ritsumeikan University(Ritsumeikan Univ.)
Date	2020-12-02
Paper #	NLC2020-14,SP2020-17
Volume (vol)	vol.120
Number (no)	NLC-270,SP-271
Page	pp.pp.7-12(NLC), pp.7-12(SP),
#Pages	6
Date of Issue	2020-11-25 (NLC, SP)