Presentation 2006-07-14
Proposal of a Crawling Method for Finding Moved Web Pages
Natsumi SAWA, Toshinari IIDA, Atsuyuki MORISHIMA, Shigeo SUGIMOTO, Hiroyuki KITAGAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) While the World Wide Web has become an indispensable medium in our society, the integrity of its contents is not always maintained because of its distributed architecture. We have been tackling the problem of fixing broken Web links, which is an example of the lost integrity of Web contents. In particular, we have been focusing on the problem of how to find moved Web pages when the movement causes broken Web links. Our previous experiments on the problem suggested that many moved Web pages can be found at the same Web site as the Web pages were originally located. Therefore, crawling through the Web site is an effective way to find moved Web pages. An exhaustive crawling, however, would take a huge cost when the size of the Web site is large. This paper proposes a crawling algorithm that visits Web pages in an efficient order. We compared our algorithm with the depth-first order crawling and found that our algorithm is effective.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # DE2006-107
Date of Issue

Conference Information
Committee DE
Conference Date 2006/7/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Proposal of a Crawling Method for Finding Moved Web Pages
Sub Title (in English)
Keyword(1)
1st Author's Name Natsumi SAWA
1st Author's Affiliation Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba()
2nd Author's Name Toshinari IIDA
2nd Author's Affiliation Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba
3rd Author's Name Atsuyuki MORISHIMA
3rd Author's Affiliation Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba
4th Author's Name Shigeo SUGIMOTO
4th Author's Affiliation Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba
5th Author's Name Hiroyuki KITAGAWA
5th Author's Affiliation Grad. Sch. of Sys. and Info. Eng., Univ. of Tsukuba
Date 2006-07-14
Paper # DE2006-107
Volume (vol) vol.106
Number (no) 150
Page pp.pp.-
#Pages 5
Date of Issue