Presentation 2005-03-18
Table structure analysis based on cell classification and cell modification for XML document transformation
Yasuto ISHITANI, Kosei FUME, Kazuo SUMITA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A new method of table structure analysis based on cell classification and cell modification is proposed in this paper as the basis of an OCR which can convert a variety of tables into an XML document in accordance with a specified DTD. The outline of this method is described as follows. Firstly, cell features defined by ruled lines, which correspond to data fields, are extracted from the input image of a table. After that, each cell is classified to identify an irregular table and modified it to form regular cell arrangement. Next, the hierarchical table structure consisting of a regular row structure of cells is extracted from the modified input table and is described using a DOM tree. In this case, logical objects within a cell are extracted and are converted into a sub-tree in the DOM tree. Finally, this DOM tree is transformed into the target XML document by an XML parser with XSLT scripts or special programs. Experimental results show the method is effective in transforming various printed tables to various XML documents.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Table structure analysis / XML document transformation / Document image analysis / Document structure analysis / OCR / Logical structure analysis
Paper # TL2004-90,PRMU2004-258
Date of Issue

Conference Information
Committee PRMU
Conference Date 2005/3/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Table structure analysis based on cell classification and cell modification for XML document transformation
Sub Title (in English)
Keyword(1) Table structure analysis
Keyword(2) XML document transformation
Keyword(3) Document image analysis
Keyword(4) Document structure analysis
Keyword(5) OCR
Keyword(6) Logical structure analysis
1st Author's Name Yasuto ISHITANI
1st Author's Affiliation Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation()
2nd Author's Name Kosei FUME
2nd Author's Affiliation Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation
3rd Author's Name Kazuo SUMITA
3rd Author's Affiliation Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation
Date 2005-03-18
Paper # TL2004-90,PRMU2004-258
Volume (vol) vol.104
Number (no) 742
Page pp.pp.-
#Pages 6
Date of Issue