Presentation | 2006-07-13 Copyright violation detection system for Web texts Takashi TASHIRO, Takanori UEDA, Taisuke HORI, Yu HIRATE, Hayato YAMANA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Due to explosive increase of the number of web pages, the number of copyright violation web pages, such as lyrics or news citation pages without permission, has also been increased. To solve this problem, we propose a system for detecting copyright violation web pages. The proposed system consists of three steps. Firstly, the system generates search keywords on phrasal units, called "bunsetsu", which are included in the "seed page." Secondly, on search keywords generated by the first step, the system gathers candidate of web pages violating copyright by using Google or Yahoo! web service. Finally, the system re-ranks the candidate web pages with similarity to the seed page. Here, we adopted "Longest Common Subsequence" of phrasal units, as a similarity measurement. Our evaluation confirmed that proposed system is able to extract copy violation web pages correctly. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Information Retrieval / Copyright Vioraiton Detection / Document Similarity |
Paper # | DE2006-54 |
Date of Issue |
Conference Information | |
Committee | DE |
---|---|
Conference Date | 2006/7/6(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Data Engineering (DE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Copyright violation detection system for Web texts |
Sub Title (in English) | |
Keyword(1) | Information Retrieval |
Keyword(2) | Copyright Vioraiton Detection |
Keyword(3) | Document Similarity |
1st Author's Name | Takashi TASHIRO |
1st Author's Affiliation | Science and Engineering, Waseda University() |
2nd Author's Name | Takanori UEDA |
2nd Author's Affiliation | Science and Engineering, Waseda University |
3rd Author's Name | Taisuke HORI |
3rd Author's Affiliation | Science and Engineering, Waseda University |
4th Author's Name | Yu HIRATE |
4th Author's Affiliation | Graduate School of Science and Engineering, Waseda University |
5th Author's Name | Hayato YAMANA |
5th Author's Affiliation | Science and Engineering, Waseda University:National Institute of Informatics |
Date | 2006-07-13 |
Paper # | DE2006-54 |
Volume (vol) | vol.106 |
Number (no) | 149 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |