Presentation 2021-03-03
[Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition
Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we investigate the effectiveness of using speaker information on the performance of speaker-imbalanced automatic speech recognition (ASR). We identify the major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) End-to-End / Speech Recognition / Speaker Recognition / Khmer language / Low-resource / Speech attribute / Multi-task / Adversarial learning
Paper # EA2020-78,SIP2020-109,SP2020-43
Date of Issue 2021-02-24 (EA, SIP, SP)

Conference Information
Committee EA / US / SP / SIP / IPSJ-SLP
Conference Date 2021/3/3(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Speech, Engineering/Electro Acoustics, Signal Processing, Ultrasonics, and Related Topics
Chair Kenichi Furuya(Oita Univ.) / Hikaru Miura(Nihon Univ.) / Hisashi Kawai(NICT) / Kazunori Hayashi(Kyoto Univ.) / 北岡 教英(豊橋技科大)
Vice Chair Yoshinobu Kajikawa(Kansai Univ.) / Kentaro Matsui(NHK) / Jun Kondo(Shizuoka Univ.) / Yoshikazu Koike(Shibaura Inst. of Tech.) / / Yukihiro Bandou(NTT) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.)
Secretary Yoshinobu Kajikawa(Univ. of Tokyo) / Kentaro Matsui(NTT) / Jun Kondo(Doshisha Univ.) / Yoshikazu Koike(Tohoku Univ.) / (Univ. of Tokyo) / Yukihiro Bandou(Waseda Univ.) / Toshihisa Tanaka(Hosei Univ.) / (Waseda Univ.)
Assistant Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE) / Shinnosuke Hirata(Tokyo Inst. of Tech.) / Yusuke Ijima(NTT) / Yuichi Tanaka(Tokyo Univ. Agri.&Tech.)

Paper Information
Registration To Technical Committee on Engineering Acoustics / Technical Committee on Ultrasonics / Technical Committee on Speech / Technical Committee on Signal Processing / Special Interest Group on Spoken Language Processing
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition
Sub Title (in English)
Keyword(1) End-to-End
Keyword(2) Speech Recognition
Keyword(3) Speaker Recognition
Keyword(4) Khmer language
Keyword(5) Low-resource
Keyword(6) Speech attribute
Keyword(7) Multi-task
Keyword(8) Adversarial learning
1st Author's Name Kak Soky
1st Author's Affiliation Kyoto University(Kyoto Univ.)
2nd Author's Name Sheng Li
2nd Author's Affiliation National Institute of Information and Communications Technology(NICT)
3rd Author's Name Masato Mimura
3rd Author's Affiliation Kyoto University(Kyoto Univ.)
4th Author's Name Chenhui Chu
4th Author's Affiliation Kyoto University(Kyoto Univ.)
5th Author's Name Tatsuya Kawahara
5th Author's Affiliation Kyoto University(Kyoto Univ.)
Date 2021-03-03
Paper # EA2020-78,SIP2020-109,SP2020-43
Volume (vol) vol.120
Number (no) EA-397,SIP-398,SP-399
Page pp.pp.109-113(EA), pp.109-113(SIP), pp.109-113(SP),
#Pages 5
Date of Issue 2021-02-24 (EA, SIP, SP)