Presentation | 2021-03-03 [Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we investigate the effectiveness of using speaker information on the performance of speaker-imbalanced automatic speech recognition (ASR). We identify the major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | End-to-End / Speech Recognition / Speaker Recognition / Khmer language / Low-resource / Speech attribute / Multi-task / Adversarial learning |
Paper # | EA2020-78,SIP2020-109,SP2020-43 |
Date of Issue | 2021-02-24 (EA, SIP, SP) |
Conference Information | |
Committee | EA / US / SP / SIP / IPSJ-SLP |
---|---|
Conference Date | 2021/3/3(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech, Engineering/Electro Acoustics, Signal Processing, Ultrasonics, and Related Topics |
Chair | Kenichi Furuya(Oita Univ.) / Hikaru Miura(Nihon Univ.) / Hisashi Kawai(NICT) / Kazunori Hayashi(Kyoto Univ.) / 北岡 教英(豊橋技科大) |
Vice Chair | Yoshinobu Kajikawa(Kansai Univ.) / Kentaro Matsui(NHK) / Jun Kondo(Shizuoka Univ.) / Yoshikazu Koike(Shibaura Inst. of Tech.) / / Yukihiro Bandou(NTT) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.) |
Secretary | Yoshinobu Kajikawa(Univ. of Tokyo) / Kentaro Matsui(NTT) / Jun Kondo(Doshisha Univ.) / Yoshikazu Koike(Tohoku Univ.) / (Univ. of Tokyo) / Yukihiro Bandou(Waseda Univ.) / Toshihisa Tanaka(Hosei Univ.) / (Waseda Univ.) |
Assistant | Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE) / Shinnosuke Hirata(Tokyo Inst. of Tech.) / Yusuke Ijima(NTT) / Yuichi Tanaka(Tokyo Univ. Agri.&Tech.) |
Paper Information | |
Registration To | Technical Committee on Engineering Acoustics / Technical Committee on Ultrasonics / Technical Committee on Speech / Technical Committee on Signal Processing / Special Interest Group on Spoken Language Processing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition |
Sub Title (in English) | |
Keyword(1) | End-to-End |
Keyword(2) | Speech Recognition |
Keyword(3) | Speaker Recognition |
Keyword(4) | Khmer language |
Keyword(5) | Low-resource |
Keyword(6) | Speech attribute |
Keyword(7) | Multi-task |
Keyword(8) | Adversarial learning |
1st Author's Name | Kak Soky |
1st Author's Affiliation | Kyoto University(Kyoto Univ.) |
2nd Author's Name | Sheng Li |
2nd Author's Affiliation | National Institute of Information and Communications Technology(NICT) |
3rd Author's Name | Masato Mimura |
3rd Author's Affiliation | Kyoto University(Kyoto Univ.) |
4th Author's Name | Chenhui Chu |
4th Author's Affiliation | Kyoto University(Kyoto Univ.) |
5th Author's Name | Tatsuya Kawahara |
5th Author's Affiliation | Kyoto University(Kyoto Univ.) |
Date | 2021-03-03 |
Paper # | EA2020-78,SIP2020-109,SP2020-43 |
Volume (vol) | vol.120 |
Number (no) | EA-397,SIP-398,SP-399 |
Page | pp.pp.109-113(EA), pp.109-113(SIP), pp.109-113(SP), |
#Pages | 5 |
Date of Issue | 2021-02-24 (EA, SIP, SP) |