Presentation 2006-07-13
Copyright violation detection system for Web texts
Takashi TASHIRO, Takanori UEDA, Taisuke HORI, Yu HIRATE, Hayato YAMANA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Due to explosive increase of the number of web pages, the number of copyright violation web pages, such as lyrics or news citation pages without permission, has also been increased. To solve this problem, we propose a system for detecting copyright violation web pages. The proposed system consists of three steps. Firstly, the system generates search keywords on phrasal units, called "bunsetsu", which are included in the "seed page." Secondly, on search keywords generated by the first step, the system gathers candidate of web pages violating copyright by using Google or Yahoo! web service. Finally, the system re-ranks the candidate web pages with similarity to the seed page. Here, we adopted "Longest Common Subsequence" of phrasal units, as a similarity measurement. Our evaluation confirmed that proposed system is able to extract copy violation web pages correctly.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Information Retrieval / Copyright Vioraiton Detection / Document Similarity
Paper # DE2006-54
Date of Issue

Conference Information
Committee DE
Conference Date 2006/7/6(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Copyright violation detection system for Web texts
Sub Title (in English)
Keyword(1) Information Retrieval
Keyword(2) Copyright Vioraiton Detection
Keyword(3) Document Similarity
1st Author's Name Takashi TASHIRO
1st Author's Affiliation Science and Engineering, Waseda University()
2nd Author's Name Takanori UEDA
2nd Author's Affiliation Science and Engineering, Waseda University
3rd Author's Name Taisuke HORI
3rd Author's Affiliation Science and Engineering, Waseda University
4th Author's Name Yu HIRATE
4th Author's Affiliation Graduate School of Science and Engineering, Waseda University
5th Author's Name Hayato YAMANA
5th Author's Affiliation Science and Engineering, Waseda University:National Institute of Informatics
Date 2006-07-13
Paper # DE2006-54
Volume (vol) vol.106
Number (no) 149
Page pp.pp.-
#Pages 6
Date of Issue