Presentation 2007/7/17
Hierarchical Classification of Web Sites to Web Directory
Minoru SASAKI, Hiroyuki SHINNOU,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A web directory is a directory on the World Wide Web. For example, Yahoo! Directory and Dmoz are well known web directories. Some categories have a lot of web site links across an extensive range of topics. So we browse the categories below to find helpful resources and information. The web directories are created and maintained by human volunteers who are experts in particular categories. So many submissions of registering URLs are delayed due to not selecting the most specific category for them. In our research, we construct a system of automatic classification into a web directory which is maintained by human. In former experiments, the keywords and the description value of the meta tag in HTML documents are very efficient for Web site classification and the effects of the common words cause misclassification of Web sites. In this paper, we describe a classification system for hierarchical web directory structure. Using the whole directory hierarchy, we consider that the system enables to construct a practical and useful web directory. To evaluate the efficiency of this system based on the values of meta tag, we make an experiment on classifying web sites into the Dmoz directory using the web site registered in the Yahoo! directory. As the results of these experiments, the average precision using meta tag is about 62.7% and that using text of HTML document is about 42.3%. The precision using meta tag is higher than using text and we find the efficiency of the meta tag in the hierarchical classification as well as the classification to flat categories.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # NLC2007-19
Date of Issue

Conference Information
Committee NLC
Conference Date 2007/7/17(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Hierarchical Classification of Web Sites to Web Directory
Sub Title (in English)
Keyword(1)
1st Author's Name Minoru SASAKI
1st Author's Affiliation Department of Computer and Information Sciences, Ibaraki University()
2nd Author's Name Hiroyuki SHINNOU
2nd Author's Affiliation Department of Computer and Information Sciences, Ibaraki University
Date 2007/7/17
Paper # NLC2007-19
Volume (vol) vol.107
Number (no) 158
Page pp.pp.-
#Pages 6
Date of Issue