Presentation | 2006-07-14 Proposal of a Crawling Method for Finding Moved Web Pages Natsumi SAWA, Toshinari IIDA, Atsuyuki MORISHIMA, Shigeo SUGIMOTO, Hiroyuki KITAGAWA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | While the World Wide Web has become an indispensable medium in our society, the integrity of its contents is not always maintained because of its distributed architecture. We have been tackling the problem of fixing broken Web links, which is an example of the lost integrity of Web contents. In particular, we have been focusing on the problem of how to find moved Web pages when the movement causes broken Web links. Our previous experiments on the problem suggested that many moved Web pages can be found at the same Web site as the Web pages were originally located. Therefore, crawling through the Web site is an effective way to find moved Web pages. An exhaustive crawling, however, would take a huge cost when the size of the Web site is large. This paper proposes a crawling algorithm that visits Web pages in an efficient order. We compared our algorithm with the depth-first order crawling and found that our algorithm is effective. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | |
Paper # | DE2006-107 |
Date of Issue |
Conference Information | |
Committee | DE |
---|---|
Conference Date | 2006/7/7(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Data Engineering (DE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Proposal of a Crawling Method for Finding Moved Web Pages |
Sub Title (in English) | |
Keyword(1) | |
1st Author's Name | Natsumi SAWA |
1st Author's Affiliation | Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba() |
2nd Author's Name | Toshinari IIDA |
2nd Author's Affiliation | Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba |
3rd Author's Name | Atsuyuki MORISHIMA |
3rd Author's Affiliation | Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba |
4th Author's Name | Shigeo SUGIMOTO |
4th Author's Affiliation | Grad. Sch. of Info. and Media Studies, Univ. of Tsukuba |
5th Author's Name | Hiroyuki KITAGAWA |
5th Author's Affiliation | Grad. Sch. of Sys. and Info. Eng., Univ. of Tsukuba |
Date | 2006-07-14 |
Paper # | DE2006-107 |
Volume (vol) | vol.106 |
Number (no) | 150 |
Page | pp.pp.- |
#Pages | 5 |
Date of Issue |