Presentation 2011-03-11
Filtering using Term Frequency and Structure of Blog Entries for Retrieving Relevant Entries
Kouki NAKATANI, Shinichi YOSHIDA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Classification of Blog articles using word vector is proposed and investigated of its precision. This algorithm classifies Blog articles to two categories-useful and useless. The number of Blog articles is 500 and 19003 word vectors are extracted and the number of word vectors is reduced to 100 based on idf value. The number of feature vectors is reduced to 3 to 5 using principal component analysis. Blog articles and their usefulness are input to neural network (multi-layer perceptron) and the network is trained by 100 articles and classifies 100 articles. The output precision is 70% in average using word-vectors, 60 % using tf-idf valued word-histograms. The result shows that the precision decreased using tf-idf valued word-histograms, but the misclassified output of a useful article to a useless article is also decreased. Therefore the recall rate is thought to be increased using tf-idf valued word-histogram.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) blog / classification / word vector / tf-idf / neural-network
Paper # PRMU2010-292
Date of Issue

Conference Information
Committee PRMU
Conference Date 2011/3/3(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Filtering using Term Frequency and Structure of Blog Entries for Retrieving Relevant Entries
Sub Title (in English)
Keyword(1) blog
Keyword(2) classification
Keyword(3) word vector
Keyword(4) tf-idf
Keyword(5) neural-network
1st Author's Name Kouki NAKATANI
1st Author's Affiliation Graduate School of Engineering, Kochi University of Technology()
2nd Author's Name Shinichi YOSHIDA
2nd Author's Affiliation School of Information, Kochi University of Technology
Date 2011-03-11
Paper # PRMU2010-292
Volume (vol) vol.110
Number (no) 467
Page pp.pp.-
#Pages 6
Date of Issue