Presentation | 2004/2/12 Document Transformation System from Papers to XML Data Based on Pivot XML Document Method (Thought and Language) Yasuto ISHITANI, Kazuo SUMITA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | XML document transformation / Document image analysis / Document structure analysis / OCR / Layout analysis / Logical structure analysis |
Paper # | TL2003-30,PRMU2003-216 |
Date of Issue |
Conference Information | |
Committee | TL |
---|---|
Conference Date | 2004/2/12(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Thought and Language (TL) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Document Transformation System from Papers to XML Data Based on Pivot XML Document Method (Thought and Language) |
Sub Title (in English) | |
Keyword(1) | XML document transformation |
Keyword(2) | Document image analysis |
Keyword(3) | Document structure analysis |
Keyword(4) | OCR |
Keyword(5) | Layout analysis |
Keyword(6) | Logical structure analysis |
1st Author's Name | Yasuto ISHITANI |
1st Author's Affiliation | Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation() |
2nd Author's Name | Kazuo SUMITA |
2nd Author's Affiliation | Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation |
Date | 2004/2/12 |
Paper # | TL2003-30,PRMU2003-216 |
Volume (vol) | vol.103 |
Number (no) | 656 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |