Presentation 2005-07-14
A Selective Web Crawling Method Based on User Examples
Jianwei ZHANG, Yoshiharu ISHIKAWA, Sayumi KUROKAWA, Hiroyuki KITAGAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose a selective web crawling method to collect web pages based on example records provided by a user. One of the features of our method is that example records are expanded dynamically with additional records extracted from the crawled HTML pages. Moreover, our system integrates the database composed of example and additional records and the web to achieve an efficient and selective crawling. Information extraction processing and crawling processing are processed adaptively according to the feedbacks from the user. Our method combines the tecniques of contents analysis, link analysis and topic-focused crawling. Therefore, the method will lead an efficient collection of web pages which contain information related to example records.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) crawling / crawler / integration of web and databases / link analysis / information extraction
Paper # DE2005-74
Date of Issue

Conference Information
Committee DE
Conference Date 2005/7/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Selective Web Crawling Method Based on User Examples
Sub Title (in English)
Keyword(1) crawling
Keyword(2) crawler
Keyword(3) integration of web and databases
Keyword(4) link analysis
Keyword(5) information extraction
1st Author's Name Jianwei ZHANG
1st Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering()
2nd Author's Name Yoshiharu ISHIKAWA
2nd Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering:Center for Computational Sciences University of Tsukuba
3rd Author's Name Sayumi KUROKAWA
3rd Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering
4th Author's Name Hiroyuki KITAGAWA
4th Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering:Center for Computational Sciences University of Tsukuba
Date 2005-07-14
Paper # DE2005-74
Volume (vol) vol.105
Number (no) 172
Page pp.pp.-
#Pages 6
Date of Issue