Presentation 2004/2/12
Development of Document Retrieval System Tolerant of Segmentation Errors of Document Images (Thought and Language)
Takeshi NAGASAKI, Katsumi MARUKAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes a new method for document retrieval which is tolerant of segmentation errors of OCR on document images. OCR-based document retrieval systems suffer from segmentation and recognition errors. The proposed method consists of two phases of image processing to overcome these problems. First, the OCR engine outputs the multiple hypotheses of character segmentation and recognition. Second, the retrieval engine extracts several keywords from the hypotheses using lexicon driven DP-matching. We have applied this method to handwritten and printed document images, and demonstrated its effectiveness in reducing false drops and false alarms of retrieval.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Segmentation Error / OCR / Document Retrieval / Lexicon Driven Dynamic Programming
Paper # TL2003-29,PRMU2003-215
Date of Issue

Conference Information
Committee TL
Conference Date 2004/2/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Thought and Language (TL)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Development of Document Retrieval System Tolerant of Segmentation Errors of Document Images (Thought and Language)
Sub Title (in English)
Keyword(1) Segmentation Error
Keyword(2) OCR
Keyword(3) Document Retrieval
Keyword(4) Lexicon Driven Dynamic Programming
1st Author's Name Takeshi NAGASAKI
1st Author's Affiliation Hitachi, Ltd., Central Research Laboratory()
2nd Author's Name Katsumi MARUKAWA
2nd Author's Affiliation Hitachi, Ltd., Central Research Laboratory
Date 2004/2/12
Paper # TL2003-29,PRMU2003-215
Volume (vol) vol.103
Number (no) 656
Page pp.pp.-
#Pages 6
Date of Issue