Presentation 2001/7/9
Dimensionality Reduction of VectorSpace Model for Information Retrieval using Simple Principal Compornent Analysis
Shingo Kuroiwa, Satoru Tsuge, Hironori Tani, Tai Xiaoying, Masami Shishibori, Kenji Kita,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The Vector Space Model (VSM) is a popular information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, computing resources necessary for the storage and processing of such data is enormous. Dimensionality reduction is a way to overcome these problems. Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition. However, such methods consume a large amount of computation resources. In the work described here, we use Simple Principal Component Analysis (SPCA), which is a data-oriented fast method, for dimensionality reduction of the vector space mopdel. Experiments based on the MEDLINE collection showed that SPCA achieved significant improvement compared to the conventional vector space model.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Simple PCA / Information retrieval / LSI / VSM / Dimensionality reduction
Paper # NLC2001-17
Date of Issue

Conference Information
Committee NLC
Conference Date 2001/7/9(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Dimensionality Reduction of VectorSpace Model for Information Retrieval using Simple Principal Compornent Analysis
Sub Title (in English)
Keyword(1) Simple PCA
Keyword(2) Information retrieval
Keyword(3) LSI
Keyword(4) VSM
Keyword(5) Dimensionality reduction
1st Author's Name Shingo Kuroiwa
1st Author's Affiliation The University of Tokushima()
2nd Author's Name Satoru Tsuge
2nd Author's Affiliation The University of Tokushima
3rd Author's Name Hironori Tani
3rd Author's Affiliation The University of Tokushima
4th Author's Name Tai Xiaoying
4th Author's Affiliation The University of Tokushima
5th Author's Name Masami Shishibori
5th Author's Affiliation The University of Tokushima
6th Author's Name Kenji Kita
6th Author's Affiliation The University of Tokushima
Date 2001/7/9
Paper # NLC2001-17
Volume (vol) vol.101
Number (no) 189
Page pp.pp.-
#Pages 6
Date of Issue