Presentation | 2017-12-23 Detection of mergeable Wikipedia articles based on multiple embedding results Renzhi Wang, Mizuho Iwaihara, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Wikipedia is the largest online encyclopedia, in which articles are edited by different volunteers with different thoughts and styles. Sometimes two or more articles’ titles are different but the themes of these articles are exactly the same or strongly similar. Administrators and editors are supposed to detect these article pairs and determine whether they should be merged together. In this paper, we propose a method to automatically determine whether an article pair should be merged together. We consider both duplicate case and overlap case. In the duplicate case, the articles pairs are covering exactly the same contents. In the overlap case, the articles pairs are covering related subjects that have a significant overlap. The content of an overlap part is similar but the words in the pair are probably different, so methods that exploit semantic relatedness are necessary. To deal with this problem we propose combination of multiple embedding results and rebuild word vectors for detecting mergeable article pairs. We also deal with various mergeable cases by combining distinct text fragments together. Our experiments show that our method performs better than existing embedding methods. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | word embeddingmergeable articleWikipediatext mining |
Paper # | DE2017-35 |
Date of Issue | 2017-12-15 (DE) |
Conference Information | |
Committee | DE / IPSJ-DBS |
---|---|
Conference Date | 2017/12/22(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | National Institute of Informatics |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Akiyo Nadamoto(Konan Univ.) / 森嶋 厚行(筑波大) |
Vice Chair | Koji Eguchi(Kobe Univ.) / Shingo Otsuka(Kanagawa Inst. of Tech.) |
Secretary | Koji Eguchi(Kogakuin Univ.) / Shingo Otsuka(Univ. of Marketing and Distrbution Science) |
Assistant | Kazuo Goda(Univ. of Tokyo) / Yuroaki Shiokawa(Tsukuba Univ.) |
Paper Information | |
Registration To | Technical Committee on Data Engineering / Special Interest Group on Database System |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Detection of mergeable Wikipedia articles based on multiple embedding results |
Sub Title (in English) | |
Keyword(1) | word embeddingmergeable articleWikipediatext mining |
1st Author's Name | Renzhi Wang |
1st Author's Affiliation | Waseda University(Waseda U.) |
2nd Author's Name | Mizuho Iwaihara |
2nd Author's Affiliation | Waseda University(Waseda U.) |
Date | 2017-12-23 |
Paper # | DE2017-35 |
Volume (vol) | vol.117 |
Number (no) | DE-374 |
Page | pp.pp.79-83(DE), |
#Pages | 5 |
Date of Issue | 2017-12-15 (DE) |