Presentation 2018-06-28
Language model utilizing image features for automatic speech recognition
Aiko Hagiwara, Hitoshi Ito, Manon Ichiki, Takeshi Mishima, Shoei Sato,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) NHK is pursuing the development of a system using speech recognition for the closed caption production of live broadcasting and transcription of interview video footage. In many cases, it is possible to acquire images as well as audio from video footage. From the images, it is expected to obtain information that leads to improvement of language model accuracy such as domain identification. Therefore, we proposed two methods to adopt image features to language models. The first method is to extract the hidden layer of the image recognition model, and the second is to incorporate the image description captions which automatically generated. Compared to the baseline recurrent neural network language model, perplexity decreased in the first method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech recognition / Language model / Image recognition / Image captioning
Paper # PRMU2018-22,SP2018-2
Date of Issue 2018-06-21 (PRMU, SP)

Conference Information
Committee PRMU / SP
Conference Date 2018/6/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.)
Assistant Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Language model utilizing image features for automatic speech recognition
Sub Title (in English)
Keyword(1) Speech recognition
Keyword(2) Language model
Keyword(3) Image recognition
Keyword(4) Image captioning
Keyword(5)
1st Author's Name Aiko Hagiwara
1st Author's Affiliation Japan Broadcasting Corporation(NHK)
2nd Author's Name Hitoshi Ito
2nd Author's Affiliation Japan Broadcasting Corporation(NHK)
3rd Author's Name Manon Ichiki
3rd Author's Affiliation Japan Broadcasting Corporation(NHK)
4th Author's Name Takeshi Mishima
4th Author's Affiliation Japan Broadcasting Corporation(NHK)
5th Author's Name Shoei Sato
5th Author's Affiliation Japan Broadcasting Corporation(NHK)
Date 2018-06-28
Paper # PRMU2018-22,SP2018-2
Volume (vol) vol.118
Number (no) PRMU-111,SP-112
Page pp.pp.3-6(PRMU), pp.3-6(SP),
#Pages 4
Date of Issue 2018-06-21 (PRMU, SP)