Presentation 2002/7/11
The Integration of multiple HTML Table information into one XML List
Kumi ITAI, Atsuhiro TAKASU, Jun ADACHI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose a method of transformation of HTML tables, which have various kinds of structure, into a common XML list structure. It enables us to browse and compare all information that is in separate HTML pages. This paper focuses on the tasks of information extraction from tables and data categorization. For this purpose, we applied two ways, (I) data classification by using Support Vector Machine and (II) a table structure estimation and data categorization by using Hidden Markov Model, and report the experimental results.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Information Extraction. / Support Vector Machine(SVM) / Hidden Markov Model(HMM) / XML
Paper # DE2002-29
Date of Issue

Conference Information
Committee DE
Conference Date 2002/7/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) The Integration of multiple HTML Table information into one XML List
Sub Title (in English)
Keyword(1) Information Extraction.
Keyword(2) Support Vector Machine(SVM)
Keyword(3) Hidden Markov Model(HMM)
Keyword(4) XML
1st Author's Name Kumi ITAI
1st Author's Affiliation Graduate School of Information Science and Technology, University of Tokyo()
2nd Author's Name Atsuhiro TAKASU
2nd Author's Affiliation National Institute of Informatics
3rd Author's Name Jun ADACHI
3rd Author's Affiliation Graduate School of Information Science and Technology, University of Tokyo
Date 2002/7/11
Paper # DE2002-29
Volume (vol) vol.102
Number (no) 208
Page pp.pp.-
#Pages 6
Date of Issue