[Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition

Kak Soky; Sheng Li; Masato Mimura; Chenhui Chu; Tatsuya Kawahara

Presentation	2021-03-03 [Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we investigate the effectiveness of using speaker information on the performance of speaker-imbalanced automatic speech recognition (ASR). We identify the major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	End-to-End / Speech Recognition / Speaker Recognition / Khmer language / Low-resource / Speech attribute / Multi-task / Adversarial learning
Paper #	EA2020-78,SIP2020-109,SP2020-43
Date of Issue	2021-02-24 (EA, SIP, SP)

Conference Information
Committee	EA / US / SP / SIP / IPSJ-SLP
Conference Date	2021/3/3(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Speech, Engineering/Electro Acoustics, Signal Processing, Ultrasonics, and Related Topics
Chair	Kenichi Furuya(Oita Univ.) / Hikaru Miura(Nihon Univ.) / Hisashi Kawai(NICT) / Kazunori Hayashi(Kyoto Univ.) / 北岡教英(豊橋技科大)
Vice Chair	Yoshinobu Kajikawa(Kansai Univ.) / Kentaro Matsui(NHK) / Jun Kondo(Shizuoka Univ.) / Yoshikazu Koike(Shibaura Inst. of Tech.) / / Yukihiro Bandou(NTT) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.)
Secretary	Yoshinobu Kajikawa(Univ. of Tokyo) / Kentaro Matsui(NTT) / Jun Kondo(Doshisha Univ.) / Yoshikazu Koike(Tohoku Univ.) / (Univ. of Tokyo) / Yukihiro Bandou(Waseda Univ.) / Toshihisa Tanaka(Hosei Univ.) / (Waseda Univ.)
Assistant	Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE) / Shinnosuke Hirata(Tokyo Inst. of Tech.) / Yusuke Ijima(NTT) / Yuichi Tanaka(Tokyo Univ. Agri.&Tech.)

Paper Information
Registration To	Technical Committee on Engineering Acoustics / Technical Committee on Ultrasonics / Technical Committee on Speech / Technical Committee on Signal Processing / Special Interest Group on Spoken Language Processing
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition
Sub Title (in English)
Keyword(1)	End-to-End
Keyword(2)	Speech Recognition
Keyword(3)	Speaker Recognition
Keyword(4)	Khmer language
Keyword(5)	Low-resource
Keyword(6)	Speech attribute
Keyword(7)	Multi-task
Keyword(8)	Adversarial learning
1st Author's Name	Kak Soky
1st Author's Affiliation	Kyoto University(Kyoto Univ.)
2nd Author's Name	Sheng Li
2nd Author's Affiliation	National Institute of Information and Communications Technology(NICT)
3rd Author's Name	Masato Mimura
3rd Author's Affiliation	Kyoto University(Kyoto Univ.)
4th Author's Name	Chenhui Chu
4th Author's Affiliation	Kyoto University(Kyoto Univ.)
5th Author's Name	Tatsuya Kawahara
5th Author's Affiliation	Kyoto University(Kyoto Univ.)
Date	2021-03-03
Paper #	EA2020-78,SIP2020-109,SP2020-43
Volume (vol)	vol.120
Number (no)	EA-397,SIP-398,SP-399
Page	pp.pp.109-113(EA), pp.109-113(SIP), pp.109-113(SP),
#Pages	5
Date of Issue	2021-02-24 (EA, SIP, SP)