Presentation | 2005-07-14 A Selective Web Crawling Method Based on User Examples Jianwei ZHANG, Yoshiharu ISHIKAWA, Sayumi KUROKAWA, Hiroyuki KITAGAWA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we propose a selective web crawling method to collect web pages based on example records provided by a user. One of the features of our method is that example records are expanded dynamically with additional records extracted from the crawled HTML pages. Moreover, our system integrates the database composed of example and additional records and the web to achieve an efficient and selective crawling. Information extraction processing and crawling processing are processed adaptively according to the feedbacks from the user. Our method combines the tecniques of contents analysis, link analysis and topic-focused crawling. Therefore, the method will lead an efficient collection of web pages which contain information related to example records. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | crawling / crawler / integration of web and databases / link analysis / information extraction |
Paper # | DE2005-74 |
Date of Issue |
Conference Information | |
Committee | DE |
---|---|
Conference Date | 2005/7/7(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Data Engineering (DE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A Selective Web Crawling Method Based on User Examples |
Sub Title (in English) | |
Keyword(1) | crawling |
Keyword(2) | crawler |
Keyword(3) | integration of web and databases |
Keyword(4) | link analysis |
Keyword(5) | information extraction |
1st Author's Name | Jianwei ZHANG |
1st Author's Affiliation | Department of Computer Science, Graduate School of Systems and Information Engineering() |
2nd Author's Name | Yoshiharu ISHIKAWA |
2nd Author's Affiliation | Department of Computer Science, Graduate School of Systems and Information Engineering:Center for Computational Sciences University of Tsukuba |
3rd Author's Name | Sayumi KUROKAWA |
3rd Author's Affiliation | Department of Computer Science, Graduate School of Systems and Information Engineering |
4th Author's Name | Hiroyuki KITAGAWA |
4th Author's Affiliation | Department of Computer Science, Graduate School of Systems and Information Engineering:Center for Computational Sciences University of Tsukuba |
Date | 2005-07-14 |
Paper # | DE2005-74 |
Volume (vol) | vol.105 |
Number (no) | 172 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |