Presentation 2001/7/11
An Indexing Method for Table Structures of HTML Format
Masami SHISHIBORI, Yoshihiro IWAGUCHI, Minsoo JUNG, Jun-ichi AOE,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) HTML documents in the WWW space frequently include the table structure, which has a very useful information, such as the meanings and relations of words in the table. In this paper, we propose the method to construct the index which keeps the relations in the table structure of HTML format. This method represents the position of each item in the table structure as the compact bit stream. Moreover, since the odd bits of this bit stream show the row relation of each item, on the other hand, the even bits are the column relation, it is very easy and quickly to compare the relation of positions of items in the table. From the experiment result using 200 HTML table structures, which are collected from WWW space by hand, it was found that this method can generate 87% percent smaller index and compare the position relations 5.4 times faster than the indexing method storing the row and column coordinates of each item.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Table structure analysis / Indexing / HTML document / Internet searching engine / Information extraction
Paper # DE2001-54
Date of Issue

Conference Information
Committee DE
Conference Date 2001/7/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) An Indexing Method for Table Structures of HTML Format
Sub Title (in English)
Keyword(1) Table structure analysis
Keyword(2) Indexing
Keyword(3) HTML document
Keyword(4) Internet searching engine
Keyword(5) Information extraction
1st Author's Name Masami SHISHIBORI
1st Author's Affiliation Dpt.of Information Science & Intelligent Systems, Faculty of Engineering, Tokushima University()
2nd Author's Name Yoshihiro IWAGUCHI
2nd Author's Affiliation Dpt.of Information Science & Intelligent Systems, Faculty of Engineering, Tokushima University
3rd Author's Name Minsoo JUNG
3rd Author's Affiliation Dpt.of Information Science & Intelligent Systems, Faculty of Engineering, Tokushima University
4th Author's Name Jun-ichi AOE
4th Author's Affiliation Dpt.of Information Science & Intelligent Systems, Faculty of Engineering, Tokushima University
Date 2001/7/11
Paper # DE2001-54
Volume (vol) vol.101
Number (no) 192
Page pp.pp.-
#Pages 8
Date of Issue