Presentation 2018-12-11
Automatic Extraction of Bilingual Sentences by Similarity based on Earth Mover's Distance using Word Embeddings and Difference of Sentence Length
Ryo Tanoue, Hiroshi Echizen'ya, Kenji Araki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose new method to automatically extract bilingual sentences from comparable corpus without high quality bilingual knowledge such as bilingual dictionary. In our proposed method, the bilingual sentences are extracted using the translation matrix and the similarity between two language sentences based on the word embeddings. In that case, the Earth Mover's Distance is used to calculate the similarity. Moreover, the weight based on the difference of lengths between two language sentences is applied to the similarity in EMD. The evaluational experiments using the news article's comparable corpus indicate that the average of F-measure of our proposed method was 0.49, those of our proposed method without the weight based on the sentence length and the method based only on EMD were respectively 0.42, and that of the method using sentence length which is the baseline was 0.13. Therefore, we confirmed the effectiveness of our proposed method using the weight based sentence length.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Bilingual Sentences / Automatic Extraction / word2vec / Earth Mover’s Distance / Sentence Length
Paper # NLC2018-30
Date of Issue 2018-12-04 (NLC)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2018/12/10(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Waseda Univ. Nishiwaseda Campus
Topics (in Japanese) (See Japanese page)
Topics (in English) The 5th Natural Language Processing Symposium & The 20th Spoken Language Symposium
Chair Takeshi Sakaki(Hottolink) / / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) / / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / / Akinobu Ri(Kyoto Univ.) / (Meijo Univ.)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) / / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Automatic Extraction of Bilingual Sentences by Similarity based on Earth Mover's Distance using Word Embeddings and Difference of Sentence Length
Sub Title (in English)
Keyword(1) Bilingual Sentences
Keyword(2) Automatic Extraction
Keyword(3) word2vec
Keyword(4) Earth Mover’s Distance
Keyword(5) Sentence Length
1st Author's Name Ryo Tanoue
1st Author's Affiliation Hokkai-Gakuen University(Hokkai-Gakuen Univ.)
2nd Author's Name Hiroshi Echizen'ya
2nd Author's Affiliation Hokkai-Gakuen University(Hokkai-Gakuen Univ.)
3rd Author's Name Kenji Araki
3rd Author's Affiliation Hokkaido University(Hokkaido Univ.)
Date 2018-12-11
Paper # NLC2018-30
Volume (vol) vol.118
Number (no) NLC-355
Page pp.pp.3-8(NLC),
#Pages 6
Date of Issue 2018-12-04 (NLC)