Presentation | 2020-03-11 Sentence Visualization Based on Relative Sentence Embeddings Haruya Ishizuka, Daichi Mochihashi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Sentence visualization is important for a organization, such as company or government, since it facilitates to understand underlying semantics within accumulated text collection. SIF vector (Arora et al., ICLR, 2017) estimated from pre-trained word vectors is a sentence vector which reflects sentence-specific information, and leveraging this data representation is one of effective approaches for this task. In this paper, we propose Relative Sentence Embeddings (RSEs) which are a novel sentence representation computed from SIF vectors and a visualization method based on this representation. RSEs are logarithmic transformation of mixing rates estimated by applying Gaussian mixture models to a set of SIF vectors. Visual coordinates are obtained by dimension reduction over RSEs via t-SNE. Utilizing properties of high-dimensional Gaussian distribution, we prove that these coordinates have higher cluster separacity than theones based on naive dimension reduction over SIF vectors. Experimental result shows our theoretical result is held in a real world dataset. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Text Visualization / Semantic Visualization / pre-trained word vectors / Sentence Embeddings |
Paper # | IBISML2019-42 |
Date of Issue | 2020-03-03 (IBISML) |
Conference Information | |
Committee | IBISML |
---|---|
Conference Date | 2020/3/10(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kyoto University |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Machine learning, etc. |
Chair | Hisashi Kashima(Kyoto Univ.) |
Vice Chair | Masashi Sugiyama(Univ. of Tokyo) / Koji Tsuda(Univ. of Tokyo) |
Secretary | Masashi Sugiyama(Nagoya Inst. of Tech.) / Koji Tsuda(AIST) |
Assistant | Tomoharu Iwata(NTT) / Shigeyuki Oba(Kyoto Univ.) |
Paper Information | |
Registration To | Technical Committee on Infomation-Based Induction Sciences and Machine Learning |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Sentence Visualization Based on Relative Sentence Embeddings |
Sub Title (in English) | |
Keyword(1) | Text Visualization |
Keyword(2) | Semantic Visualization |
Keyword(3) | pre-trained word vectors |
Keyword(4) | Sentence Embeddings |
1st Author's Name | Haruya Ishizuka |
1st Author's Affiliation | Bridgeston Corporation(Bridgestone Corp.) |
2nd Author's Name | Daichi Mochihashi |
2nd Author's Affiliation | The Institute of Statistical Mathematics(ISM) |
Date | 2020-03-11 |
Paper # | IBISML2019-42 |
Volume (vol) | vol.119 |
Number (no) | IBISML-476 |
Page | pp.pp.63-70(IBISML), |
#Pages | 8 |
Date of Issue | 2020-03-03 (IBISML) |