Presentation 2019-05-31
Cross-modal Search using Visually Grounded Multilingual Speech Signal
Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kashino Kunio, David Harwath, James Glass,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We evaluate a deep neural network model capable of learning to associate images and audio captions describing the content of those images on crossmodal search (image and speech retrieval). We show that training a trilingual model simultaneously on English, Hindi, and newly recorded Japanese audio caption data offers improved performance over the monolingual models. Further, we demonstrate the trilingual model implicitly learns meaningful word-level translations based on images.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Vision and spoken language / Shared latent space / Crossmodal search / Convolutional neural network
Paper # PRMU2019-11
Date of Issue 2019-05-23 (PRMU)

Conference Information
Committee PRMU / IPSJ-CVIM
Conference Date 2019/5/30(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII)
Vice Chair Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.)
Secretary Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.)
Assistant Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Cross-modal Search using Visually Grounded Multilingual Speech Signal
Sub Title (in English)
Keyword(1) Vision and spoken language
Keyword(2) Shared latent space
Keyword(3) Crossmodal search
Keyword(4) Convolutional neural network
1st Author's Name Yasunori Ohishi
1st Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
2nd Author's Name Akisato Kimura
2nd Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
3rd Author's Name Takahito Kawanishi
3rd Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
4th Author's Name Kashino Kunio
4th Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
5th Author's Name David Harwath
5th Author's Affiliation Massachusetts Institute of Technology(MIT)
6th Author's Name James Glass
6th Author's Affiliation Massachusetts Institute of Technology(MIT)
Date 2019-05-31
Paper # PRMU2019-11
Volume (vol) vol.119
Number (no) PRMU-64
Page pp.pp.283-288(PRMU),
#Pages 6
Date of Issue 2019-05-23 (PRMU)