Presentation | 2019-05-31 Cross-modal Search using Visually Grounded Multilingual Speech Signal Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kashino Kunio, David Harwath, James Glass, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We evaluate a deep neural network model capable of learning to associate images and audio captions describing the content of those images on crossmodal search (image and speech retrieval). We show that training a trilingual model simultaneously on English, Hindi, and newly recorded Japanese audio caption data offers improved performance over the monolingual models. Further, we demonstrate the trilingual model implicitly learns meaningful word-level translations based on images. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Vision and spoken language / Shared latent space / Crossmodal search / Convolutional neural network |
Paper # | PRMU2019-11 |
Date of Issue | 2019-05-23 (PRMU) |
Conference Information | |
Committee | PRMU / IPSJ-CVIM |
---|---|
Conference Date | 2019/5/30(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) |
Vice Chair | Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) |
Secretary | Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) |
Assistant | Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Cross-modal Search using Visually Grounded Multilingual Speech Signal |
Sub Title (in English) | |
Keyword(1) | Vision and spoken language |
Keyword(2) | Shared latent space |
Keyword(3) | Crossmodal search |
Keyword(4) | Convolutional neural network |
1st Author's Name | Yasunori Ohishi |
1st Author's Affiliation | NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT) |
2nd Author's Name | Akisato Kimura |
2nd Author's Affiliation | NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT) |
3rd Author's Name | Takahito Kawanishi |
3rd Author's Affiliation | NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT) |
4th Author's Name | Kashino Kunio |
4th Author's Affiliation | NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT) |
5th Author's Name | David Harwath |
5th Author's Affiliation | Massachusetts Institute of Technology(MIT) |
6th Author's Name | James Glass |
6th Author's Affiliation | Massachusetts Institute of Technology(MIT) |
Date | 2019-05-31 |
Paper # | PRMU2019-11 |
Volume (vol) | vol.119 |
Number (no) | PRMU-64 |
Page | pp.pp.283-288(PRMU), |
#Pages | 6 |
Date of Issue | 2019-05-23 (PRMU) |