Presentation 2013/7/15
XML Documents Searching Combining Structure and Keywords Similarities
APICHAYA AUVATTANASOMBAT, YOUSUKE WATANABE, HARUO YOKOTA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In recent years, XML has been increasingly become an emerging standard and widely used in many appli-cations. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) XML Similarity / OOXML / Keyword Similarity / Document Search
Paper # Vol.2013-DBS-157 No.14,Vol.2013-IFAT-111 No.14
Date of Issue

Conference Information
Committee DE
Conference Date 2013/7/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) XML Documents Searching Combining Structure and Keywords Similarities
Sub Title (in English)
Keyword(1) XML Similarity
Keyword(2) OOXML
Keyword(3) Keyword Similarity
Keyword(4) Document Search
1st Author's Name APICHAYA AUVATTANASOMBAT
1st Author's Affiliation Tokyo Institute of Technology:Chulalongkorn University()
2nd Author's Name YOUSUKE WATANABE
2nd Author's Affiliation Tokyo Institute of Technology
3rd Author's Name HARUO YOKOTA
3rd Author's Affiliation Tokyo Institute of Technology
Date 2013/7/15
Paper # Vol.2013-DBS-157 No.14,Vol.2013-IFAT-111 No.14
Volume (vol) vol.113
Number (no) 150
Page pp.pp.-
#Pages 6
Date of Issue