Presentation | 2018-12-11 Automatic Extraction of Bilingual Sentences by Similarity based on Earth Mover's Distance using Word Embeddings and Difference of Sentence Length Ryo Tanoue, Hiroshi Echizen'ya, Kenji Araki, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we propose new method to automatically extract bilingual sentences from comparable corpus without high quality bilingual knowledge such as bilingual dictionary. In our proposed method, the bilingual sentences are extracted using the translation matrix and the similarity between two language sentences based on the word embeddings. In that case, the Earth Mover's Distance is used to calculate the similarity. Moreover, the weight based on the difference of lengths between two language sentences is applied to the similarity in EMD. The evaluational experiments using the news article's comparable corpus indicate that the average of F-measure of our proposed method was 0.49, those of our proposed method without the weight based on the sentence length and the method based only on EMD were respectively 0.42, and that of the method using sentence length which is the baseline was 0.13. Therefore, we confirmed the effectiveness of our proposed method using the weight based sentence length. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Bilingual Sentences / Automatic Extraction / word2vec / Earth Mover’s Distance / Sentence Length |
Paper # | NLC2018-30 |
Date of Issue | 2018-12-04 (NLC) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2018/12/10(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Waseda Univ. Nishiwaseda Campus |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The 5th Natural Language Processing Symposium & The 20th Spoken Language Symposium |
Chair | Takeshi Sakaki(Hottolink) / / Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Meijo Univ.) |
Assistant | Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Automatic Extraction of Bilingual Sentences by Similarity based on Earth Mover's Distance using Word Embeddings and Difference of Sentence Length |
Sub Title (in English) | |
Keyword(1) | Bilingual Sentences |
Keyword(2) | Automatic Extraction |
Keyword(3) | word2vec |
Keyword(4) | Earth Mover’s Distance |
Keyword(5) | Sentence Length |
1st Author's Name | Ryo Tanoue |
1st Author's Affiliation | Hokkai-Gakuen University(Hokkai-Gakuen Univ.) |
2nd Author's Name | Hiroshi Echizen'ya |
2nd Author's Affiliation | Hokkai-Gakuen University(Hokkai-Gakuen Univ.) |
3rd Author's Name | Kenji Araki |
3rd Author's Affiliation | Hokkaido University(Hokkaido Univ.) |
Date | 2018-12-11 |
Paper # | NLC2018-30 |
Volume (vol) | vol.118 |
Number (no) | NLC-355 |
Page | pp.pp.3-8(NLC), |
#Pages | 6 |
Date of Issue | 2018-12-04 (NLC) |