Presentation | 2018-06-28 Language model utilizing image features for automatic speech recognition Aiko Hagiwara, Hitoshi Ito, Manon Ichiki, Takeshi Mishima, Shoei Sato, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | NHK is pursuing the development of a system using speech recognition for the closed caption production of live broadcasting and transcription of interview video footage. In many cases, it is possible to acquire images as well as audio from video footage. From the images, it is expected to obtain information that leads to improvement of language model accuracy such as domain identification. Therefore, we proposed two methods to adopt image features to language models. The first method is to extract the hidden layer of the image recognition model, and the second is to incorporate the image description captions which automatically generated. Compared to the baseline recurrent neural network language model, perplexity decreased in the first method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech recognition / Language model / Image recognition / Image captioning |
Paper # | PRMU2018-22,SP2018-2 |
Date of Issue | 2018-06-21 (PRMU, SP) |
Conference Information | |
Committee | PRMU / SP |
---|---|
Conference Date | 2018/6/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.) |
Assistant | Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Language model utilizing image features for automatic speech recognition |
Sub Title (in English) | |
Keyword(1) | Speech recognition |
Keyword(2) | Language model |
Keyword(3) | Image recognition |
Keyword(4) | Image captioning |
Keyword(5) | |
1st Author's Name | Aiko Hagiwara |
1st Author's Affiliation | Japan Broadcasting Corporation(NHK) |
2nd Author's Name | Hitoshi Ito |
2nd Author's Affiliation | Japan Broadcasting Corporation(NHK) |
3rd Author's Name | Manon Ichiki |
3rd Author's Affiliation | Japan Broadcasting Corporation(NHK) |
4th Author's Name | Takeshi Mishima |
4th Author's Affiliation | Japan Broadcasting Corporation(NHK) |
5th Author's Name | Shoei Sato |
5th Author's Affiliation | Japan Broadcasting Corporation(NHK) |
Date | 2018-06-28 |
Paper # | PRMU2018-22,SP2018-2 |
Volume (vol) | vol.118 |
Number (no) | PRMU-111,SP-112 |
Page | pp.pp.3-6(PRMU), pp.3-6(SP), |
#Pages | 4 |
Date of Issue | 2018-06-21 (PRMU, SP) |