Presentation 2015-02-05
Extracting Similar Documents by Eigenvector Algorithm
Shoko KATO, Kazumi SAITO, Kazuhiko KAZAMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we extract some similar documents from large number of text documents by calculating eigenvector of document-term similarlity matrics. Namely, we propose a Weighted-SR (WSR) method based on the Spectral-Relaxation (SR) method. The SR method is one of core extraction methods of complex networks. We also consider LSA-WSR and MDS-WSR methods based on LSA and MDS. In our experiments using a text document dataset from Yahoo! News, We demonstrate that these methods extract documents which consist of mixed topics and split one topic into some core portions. We also show that the number of extracted documents is decreased and similar documents narrowed down by increasing η which is an arbitrary parameter.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Documents Extraction / Core Analysis / Eigenvector / Topic Extraction
Paper # NLC2014-46
Date of Issue

Conference Information
Committee NLC
Conference Date 2015/1/29(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Extracting Similar Documents by Eigenvector Algorithm
Sub Title (in English)
Keyword(1) Documents Extraction
Keyword(2) Core Analysis
Keyword(3) Eigenvector
Keyword(4) Topic Extraction
1st Author's Name Shoko KATO
1st Author's Affiliation Graduate School of Management and Information of Innovation, University of Shizuoka()
2nd Author's Name Kazumi SAITO
2nd Author's Affiliation Graduate School of Management and Information of Innovation, University of Shizuoka
3rd Author's Name Kazuhiko KAZAMA
3rd Author's Affiliation Faculty of Systems Engineering, Wakayama University
Date 2015-02-05
Paper # NLC2014-46
Volume (vol) vol.114
Number (no) 444
Page pp.pp.-
#Pages 6
Date of Issue