(英) |
Since reasearch in Speech Emotion Recognition(SER) is performed with mostly English data, applying these models to Japanese SER is difficult. Therefore, in this paper, we create new Japanese data set(XperDES) labelled with 8 emotions of “neutral, gentle, happy, sad, angry, horrible, disgust, and surprise”. We show a convectional neural network trained with XperDES is superior to a model using an existing English SER data set(RAVDESS). Moreover, we also find a further improvement in accuracy (+ 4.2%) leveraging both RAVDESS and XperDES. |