Presentation 2020-03-11
Sentence Visualization Based on Relative Sentence Embeddings
Haruya Ishizuka, Daichi Mochihashi,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Sentence visualization is important for a organization, such as company or government, since it facilitates to understand underlying semantics within accumulated text collection. SIF vector (Arora et al., ICLR, 2017) estimated from pre-trained word vectors is a sentence vector which reflects sentence-specific information, and leveraging this data representation is one of effective approaches for this task. In this paper, we propose Relative Sentence Embeddings (RSEs) which are a novel sentence representation computed from SIF vectors and a visualization method based on this representation. RSEs are logarithmic transformation of mixing rates estimated by applying Gaussian mixture models to a set of SIF vectors. Visual coordinates are obtained by dimension reduction over RSEs via t-SNE. Utilizing properties of high-dimensional Gaussian distribution, we prove that these coordinates have higher cluster separacity than theones based on naive dimension reduction over SIF vectors. Experimental result shows our theoretical result is held in a real world dataset.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Text Visualization / Semantic Visualization / pre-trained word vectors / Sentence Embeddings
Paper # IBISML2019-42
Date of Issue 2020-03-03 (IBISML)

Conference Information
Committee IBISML
Conference Date 2020/3/10(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Kyoto University
Topics (in Japanese) (See Japanese page)
Topics (in English) Machine learning, etc.
Chair Hisashi Kashima(Kyoto Univ.)
Vice Chair Masashi Sugiyama(Univ. of Tokyo) / Koji Tsuda(Univ. of Tokyo)
Secretary Masashi Sugiyama(Nagoya Inst. of Tech.) / Koji Tsuda(AIST)
Assistant Tomoharu Iwata(NTT) / Shigeyuki Oba(Kyoto Univ.)

Paper Information
Registration To Technical Committee on Infomation-Based Induction Sciences and Machine Learning
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Sentence Visualization Based on Relative Sentence Embeddings
Sub Title (in English)
Keyword(1) Text Visualization
Keyword(2) Semantic Visualization
Keyword(3) pre-trained word vectors
Keyword(4) Sentence Embeddings
1st Author's Name Haruya Ishizuka
1st Author's Affiliation Bridgeston Corporation(Bridgestone Corp.)
2nd Author's Name Daichi Mochihashi
2nd Author's Affiliation The Institute of Statistical Mathematics(ISM)
Date 2020-03-11
Paper # IBISML2019-42
Volume (vol) vol.119
Number (no) IBISML-476
Page pp.pp.63-70(IBISML),
#Pages 8
Date of Issue 2020-03-03 (IBISML)