Presentation | 2007/7/17 Hierarchical Classification of Web Sites to Web Directory Minoru SASAKI, Hiroyuki SHINNOU, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | A web directory is a directory on the World Wide Web. For example, Yahoo! Directory and Dmoz are well known web directories. Some categories have a lot of web site links across an extensive range of topics. So we browse the categories below to find helpful resources and information. The web directories are created and maintained by human volunteers who are experts in particular categories. So many submissions of registering URLs are delayed due to not selecting the most specific category for them. In our research, we construct a system of automatic classification into a web directory which is maintained by human. In former experiments, the keywords and the description value of the meta tag in HTML documents are very efficient for Web site classification and the effects of the common words cause misclassification of Web sites. In this paper, we describe a classification system for hierarchical web directory structure. Using the whole directory hierarchy, we consider that the system enables to construct a practical and useful web directory. To evaluate the efficiency of this system based on the values of meta tag, we make an experiment on classifying web sites into the Dmoz directory using the web site registered in the Yahoo! directory. As the results of these experiments, the average precision using meta tag is about 62.7% and that using text of HTML document is about 42.3%. The precision using meta tag is higher than using text and we find the efficiency of the meta tag in the hierarchical classification as well as the classification to flat categories. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | |
Paper # | NLC2007-19 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2007/7/17(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Hierarchical Classification of Web Sites to Web Directory |
Sub Title (in English) | |
Keyword(1) | |
1st Author's Name | Minoru SASAKI |
1st Author's Affiliation | Department of Computer and Information Sciences, Ibaraki University() |
2nd Author's Name | Hiroyuki SHINNOU |
2nd Author's Affiliation | Department of Computer and Information Sciences, Ibaraki University |
Date | 2007/7/17 |
Paper # | NLC2007-19 |
Volume (vol) | vol.107 |
Number (no) | 158 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |