Presentation 2011-08-02
Wikipedia version tree reconstruction by clustering revisions through keywords
Zhe Cao, Mizuho Iwaihara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) As the widespread diffusion of user generated contents, documents having past versions are rapidly growing, especially among the field of wiki contents and office documents. Take Wikipedia for example, it has been the world's largest collaboratively edited source of encyclopedic knowledge. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. For each article, Wikipedia provides a method to export an XML file of an edit history having timestamps, which is essential to evaluate trustworthiness and provenance of the article. The problem is that even though there is an edit history, it is still hard to know how an article has evolved. A tree structure is embedded in the linear structure of the timestamps. To overcome this problem, we propose a version tree reconstruction method by clustering versions through keywords. A version tree can explain how a document has evolved through collaborative editing as well as illuminate dependencies among documents. In this paper, we will show experimental evaluation on a number of edit histories from Wikipedia to validate how our proposed method works.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) version tree / Wikipedia / keyword / clustering
Paper # DE2011-32
Date of Issue

Conference Information
Committee DE
Conference Date 2011/7/26(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Wikipedia version tree reconstruction by clustering revisions through keywords
Sub Title (in English)
Keyword(1) version tree
Keyword(2) Wikipedia
Keyword(3) keyword
Keyword(4) clustering
1st Author's Name Zhe Cao
1st Author's Affiliation ()
2nd Author's Name Mizuho Iwaihara
2nd Author's Affiliation
Date 2011-08-02
Paper # DE2011-32
Volume (vol) vol.111
Number (no) 173
Page pp.pp.-
#Pages 6
Date of Issue