Presentation 2004/11/12
Artistic Line Extraction from Indian Documents
Umapada Pal, Partha Pratim Roy, N. Tripathy, Hiroyuki Hase,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or may be curved shapes. For the Optical Character Recognition (OCR) of these documents, we need to extract such lines properly. Because of multi-oriented and curved behaviour it is very difficult to extract different text lines from the document. In this paper, we propose a water reservoir principle based scheme to extract individual text lines from printed Indian artistic documents. In the proposed scheme, at first, analyzing the area of the reservoirs obtained in a component, we compute mode (portrait, landscape, reverse portrait reverse landscape) of the component. Next based on the mode and the water reservoir features like number of reservoirs, height of reservoirs, overlapping portion of two reservoirs, etc. the components are grouped into isolated or touching class. Next depending on reservoir base-area and loops of a component, some candidate envelope points are detected. Each touching component is then classified, either straight or curve type depending on the candidate envelope points of the component. Based on the type of a component two boundary points are computed from each touching component. Finally, candidate regions (neighborhoods) of the boundary points of each component are detected and analyzing these candidate regions, individual text lines are segmented.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Text line extraction / Artistic document analysis / Multi-oriented document recognition / Indian document analysis
Paper # PRMU2004-116,HIP2004-56
Date of Issue

Conference Information
Committee PRMU
Conference Date 2004/11/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Artistic Line Extraction from Indian Documents
Sub Title (in English)
Keyword(1) Text line extraction
Keyword(2) Artistic document analysis
Keyword(3) Multi-oriented document recognition
Keyword(4) Indian document analysis
1st Author's Name Umapada Pal
1st Author's Affiliation Computer Vision and Pattern Recognition Unit, Indian Statistical Institute()
2nd Author's Name Partha Pratim Roy
2nd Author's Affiliation Computer Vision and Pattern Recognition Unit, Indian Statistical Institute
3rd Author's Name N. Tripathy
3rd Author's Affiliation Computer Vision and Pattern Recognition Unit, Indian Statistical Institute
4th Author's Name Hiroyuki Hase
4th Author's Affiliation Computer Vision and Pattern Recognition Unit, Indian Statistical Institute
Date 2004/11/12
Paper # PRMU2004-116,HIP2004-56
Volume (vol) vol.104
Number (no) 448
Page pp.pp.-
#Pages 6
Date of Issue