Presentation 2017-12-23
Detection of mergeable Wikipedia articles based on multiple embedding results
Renzhi Wang, Mizuho Iwaihara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Wikipedia is the largest online encyclopedia, in which articles are edited by different volunteers with different thoughts and styles. Sometimes two or more articles’ titles are different but the themes of these articles are exactly the same or strongly similar. Administrators and editors are supposed to detect these article pairs and determine whether they should be merged together. In this paper, we propose a method to automatically determine whether an article pair should be merged together. We consider both duplicate case and overlap case. In the duplicate case, the articles pairs are covering exactly the same contents. In the overlap case, the articles pairs are covering related subjects that have a significant overlap. The content of an overlap part is similar but the words in the pair are probably different, so methods that exploit semantic relatedness are necessary. To deal with this problem we propose combination of multiple embedding results and rebuild word vectors for detecting mergeable article pairs. We also deal with various mergeable cases by combining distinct text fragments together. Our experiments show that our method performs better than existing embedding methods.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) word embeddingmergeable articleWikipediatext mining
Paper # DE2017-35
Date of Issue 2017-12-15 (DE)

Conference Information
Committee DE / IPSJ-DBS
Conference Date 2017/12/22(2days)
Place (in Japanese) (See Japanese page)
Place (in English) National Institute of Informatics
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Akiyo Nadamoto(Konan Univ.) / 森嶋 厚行(筑波大)
Vice Chair Koji Eguchi(Kobe Univ.) / Shingo Otsuka(Kanagawa Inst. of Tech.)
Secretary Koji Eguchi(Kogakuin Univ.) / Shingo Otsuka(Univ. of Marketing and Distrbution Science)
Assistant Kazuo Goda(Univ. of Tokyo) / Yuroaki Shiokawa(Tsukuba Univ.)

Paper Information
Registration To Technical Committee on Data Engineering / Special Interest Group on Database System
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Detection of mergeable Wikipedia articles based on multiple embedding results
Sub Title (in English)
Keyword(1) word embeddingmergeable articleWikipediatext mining
1st Author's Name Renzhi Wang
1st Author's Affiliation Waseda University(Waseda U.)
2nd Author's Name Mizuho Iwaihara
2nd Author's Affiliation Waseda University(Waseda U.)
Date 2017-12-23
Paper # DE2017-35
Volume (vol) vol.117
Number (no) DE-374
Page pp.pp.79-83(DE),
#Pages 5
Date of Issue 2017-12-15 (DE)