Presentation 2008-06-19
Extraction of representative words from documents using concept-vectors of words
Toshio UCHIYAMA, Katsuji BESSHO, Tadasu UCHIYAMA, Masahiro OKU,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A concept-based method presents feature of words by vectors. Since documents are composed of a set of words, they have a set of concept vectors of words. A center of gravity of concept vectors in documents becomes to present a feature of documents, and it can be used for search and classification problems. However, only one vector such as the center of gravity may not be enough to present their whole feature, considering a lot of words in them. It is also a problem that a vector expression is not easy to be recognized by human directly. Therefore, this paper proposes a novel method that presents features of documents by representative words of the documents. It also presents a method that extracts prototype vectors from a set of concept vectors and that derives representative words from the prototype vectors.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Document feature / Concept vector / Clustering / Representative words
Paper # DE2008-9,PRMU2008-27
Date of Issue

Conference Information
Committee DE
Conference Date 2008/6/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Extraction of representative words from documents using concept-vectors of words
Sub Title (in English)
Keyword(1) Document feature
Keyword(2) Concept vector
Keyword(3) Clustering
Keyword(4) Representative words
1st Author's Name Toshio UCHIYAMA
1st Author's Affiliation NTT Cyber Solution Laboratories, NTT Corporation()
2nd Author's Name Katsuji BESSHO
2nd Author's Affiliation NTT Cyber Solution Laboratories, NTT Corporation
3rd Author's Name Tadasu UCHIYAMA
3rd Author's Affiliation NTT Cyber Solution Laboratories, NTT Corporation
4th Author's Name Masahiro OKU
4th Author's Affiliation NTT Cyber Solution Laboratories, NTT Corporation
Date 2008-06-19
Paper # DE2008-9,PRMU2008-27
Volume (vol) vol.108
Number (no) 93
Page pp.pp.-
#Pages 6
Date of Issue