Presentation 2004/2/12
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method (Thought and Language)
Yasuto ISHITANI, Kazuo SUMITA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) XML document transformation / Document image analysis / Document structure analysis / OCR / Layout analysis / Logical structure analysis
Paper # TL2003-30,PRMU2003-216
Date of Issue

Conference Information
Committee TL
Conference Date 2004/2/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Thought and Language (TL)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Document Transformation System from Papers to XML Data Based on Pivot XML Document Method (Thought and Language)
Sub Title (in English)
Keyword(1) XML document transformation
Keyword(2) Document image analysis
Keyword(3) Document structure analysis
Keyword(4) OCR
Keyword(5) Layout analysis
Keyword(6) Logical structure analysis
1st Author's Name Yasuto ISHITANI
1st Author's Affiliation Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation()
2nd Author's Name Kazuo SUMITA
2nd Author's Affiliation Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation
Date 2004/2/12
Paper # TL2003-30,PRMU2003-216
Volume (vol) vol.103
Number (no) 656
Page pp.pp.-
#Pages 6
Date of Issue