Presentation 2019-09-28
Estimating Distributed Expressions of Unknown Compound Word Using Distributed Expressions of Known Words
Ryota Takagi, Kazuhiro Kazama, Takeshi Sakaki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In recent years, the distributed expression, which treats the meaning of a word as a low-dimensional vector expression, is widely used. There are various methods to obtain the distributed expression of a word. For example, Mikolov et al. proposed word2vec that learns the surrounding words of a word by a neural network, and outputs the vector of weights in the middle layer of the learning result as a distributed representation, However, an unknown compound word should be re-learn even if it is a known word sequences because only distributed repre- sentations of words, which are used for learning, are available in word2vec. It requires a lot of additional cost. We propose a method to estimate the distributed expression of unknown compound words with relatively high accuracy, using distributed expression data that has been already learned and a simple statistical indicator. In practice, we focus on the modification relation between nouns that constitute a Japanese compound words and weight distributed expression vectors by compound noun frequency of noun 2-grams. Additionally, we compare the similarity and the MRR of the proposed method with those of other methods that are used to obtain the distributed expression of a sentence using the distributed expression of words such as the simple average method and the method proposed by Arora et al. etc. We show the effectiveness of the proposed method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) compound words / distributed representation / word2vec / compound noun frequency / modification relation
Paper # NLC2019-27
Date of Issue 2019-09-20 (NLC)

Conference Information
Committee NLC / IPSJ-DC
Conference Date 2019/9/27(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Future Corporation
Topics (in Japanese) (See Japanese page)
Topics (in English) The Thirteenth Text Analytics Symposium
Chair Takeshi Sakaki(Hottolink) / Ryoji Akimoto(Toppan Printing)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / (Future Univ. Hakodate)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Document Communication
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Estimating Distributed Expressions of Unknown Compound Word Using Distributed Expressions of Known Words
Sub Title (in English)
Keyword(1) compound words
Keyword(2) distributed representation
Keyword(3) word2vec
Keyword(4) compound noun frequency
Keyword(5) modification relation
1st Author's Name Ryota Takagi
1st Author's Affiliation Wakayama University(Wakayama Univ)
2nd Author's Name Kazuhiro Kazama
2nd Author's Affiliation Wakayama University(Wakayama Univ)
3rd Author's Name Takeshi Sakaki
3rd Author's Affiliation Hotto Link Inc.(Hotto Link)
Date 2019-09-28
Paper # NLC2019-27
Volume (vol) vol.119
Number (no) NLC-212
Page pp.pp.103-108(NLC),
#Pages 6
Date of Issue 2019-09-20 (NLC)