音声認識のための画像特徴量を用いた言語モデルの検討

Presentation	2018-06-28 Language model utilizing image features for automatic speech recognition Aiko Hagiwara, Hitoshi Ito, Manon Ichiki, Takeshi Mishima, Shoei Sato,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	NHK is pursuing the development of a system using speech recognition for the closed caption production of live broadcasting and transcription of interview video footage. In many cases, it is possible to acquire images as well as audio from video footage. From the images, it is expected to obtain information that leads to improvement of language model accuracy such as domain identification. Therefore, we proposed two methods to adopt image features to language models. The first method is to extract the hidden layer of the image recognition model, and the second is to incorporate the image description captions which automatically generated. Compared to the baseline recurrent neural network language model, perplexity decreased in the first method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech recognition / Language model / Image recognition / Image captioning
Paper #	PRMU2018-22,SP2018-2
Date of Issue	2018-06-21 (PRMU, SP)

Conference Information
Committee	PRMU / SP
Conference Date	2018/6/28(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair	Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.)
Assistant	Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To	Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Language model utilizing image features for automatic speech recognition
Sub Title (in English)
Keyword(1)	Speech recognition
Keyword(2)	Language model
Keyword(3)	Image recognition
Keyword(4)	Image captioning
Keyword(5)
1st Author's Name	Aiko Hagiwara
1st Author's Affiliation	Japan Broadcasting Corporation(NHK)
2nd Author's Name	Hitoshi Ito
2nd Author's Affiliation	Japan Broadcasting Corporation(NHK)
3rd Author's Name	Manon Ichiki
3rd Author's Affiliation	Japan Broadcasting Corporation(NHK)
4th Author's Name	Takeshi Mishima
4th Author's Affiliation	Japan Broadcasting Corporation(NHK)
5th Author's Name	Shoei Sato
5th Author's Affiliation	Japan Broadcasting Corporation(NHK)
Date	2018-06-28
Paper #	PRMU2018-22,SP2018-2
Volume (vol)	vol.118
Number (no)	PRMU-111,SP-112
Page	pp.pp.3-6(PRMU), pp.3-6(SP),
#Pages	4
Date of Issue	2018-06-21 (PRMU, SP)