Presentation | 2019-09-28 Estimating Distributed Expressions of Unknown Compound Word Using Distributed Expressions of Known Words Ryota Takagi, Kazuhiro Kazama, Takeshi Sakaki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In recent years, the distributed expression, which treats the meaning of a word as a low-dimensional vector expression, is widely used. There are various methods to obtain the distributed expression of a word. For example, Mikolov et al. proposed word2vec that learns the surrounding words of a word by a neural network, and outputs the vector of weights in the middle layer of the learning result as a distributed representation, However, an unknown compound word should be re-learn even if it is a known word sequences because only distributed repre- sentations of words, which are used for learning, are available in word2vec. It requires a lot of additional cost. We propose a method to estimate the distributed expression of unknown compound words with relatively high accuracy, using distributed expression data that has been already learned and a simple statistical indicator. In practice, we focus on the modification relation between nouns that constitute a Japanese compound words and weight distributed expression vectors by compound noun frequency of noun 2-grams. Additionally, we compare the similarity and the MRR of the proposed method with those of other methods that are used to obtain the distributed expression of a sentence using the distributed expression of words such as the simple average method and the method proposed by Arora et al. etc. We show the effectiveness of the proposed method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | compound words / distributed representation / word2vec / compound noun frequency / modification relation |
Paper # | NLC2019-27 |
Date of Issue | 2019-09-20 (NLC) |
Conference Information | |
Committee | NLC / IPSJ-DC |
---|---|
Conference Date | 2019/9/27(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Future Corporation |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The Thirteenth Text Analytics Symposium |
Chair | Takeshi Sakaki(Hottolink) / Ryoji Akimoto(Toppan Printing) |
Vice Chair | Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.) |
Secretary | Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT) / (Future Univ. Hakodate) |
Assistant | Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Document Communication |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Estimating Distributed Expressions of Unknown Compound Word Using Distributed Expressions of Known Words |
Sub Title (in English) | |
Keyword(1) | compound words |
Keyword(2) | distributed representation |
Keyword(3) | word2vec |
Keyword(4) | compound noun frequency |
Keyword(5) | modification relation |
1st Author's Name | Ryota Takagi |
1st Author's Affiliation | Wakayama University(Wakayama Univ) |
2nd Author's Name | Kazuhiro Kazama |
2nd Author's Affiliation | Wakayama University(Wakayama Univ) |
3rd Author's Name | Takeshi Sakaki |
3rd Author's Affiliation | Hotto Link Inc.(Hotto Link) |
Date | 2019-09-28 |
Paper # | NLC2019-27 |
Volume (vol) | vol.119 |
Number (no) | NLC-212 |
Page | pp.pp.103-108(NLC), |
#Pages | 6 |
Date of Issue | 2019-09-20 (NLC) |