Presentation 1998/5/13
Reduction of Expanded Search Terms for Fuzzy English-text Retrieval
Manabu OHTA, Atsuhiro TAKASU, Jun ADACHI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) OSR misrecognition is a serious problem where OCR-recognized text is used for retrieval purpose in digital libraries. We have proposed fuzzy retrieval methods which assume that errors remain in the recognized text, without correcting errors manually from a cost standpoint. The proposed methods generate multiple search terms for an input query term by referring to the confusion matrices which store all characters likely to be misrecognized and the respective probability of each misrecognition. The proposed methods can improve recall rate without decreasing precision rate but occasionally generate a few million search terms in English fuzzy retrieval, which is a bottleneck for retrieval speed. Therefore this paper presents a method to reduce the number of the generated search terms with keeping sufficient retrieval effectiveness by restricting the number of errors included in the expanded search terms.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) fuzzy retrieval / query term expansion / retrieval speed / confusion matrix / OCR
Paper #
Date of Issue

Conference Information
Committee DE
Conference Date 1998/5/13(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Reduction of Expanded Search Terms for Fuzzy English-text Retrieval
Sub Title (in English)
Keyword(1) fuzzy retrieval
Keyword(2) query term expansion
Keyword(3) retrieval speed
Keyword(4) confusion matrix
Keyword(5) OCR
1st Author's Name Manabu OHTA
1st Author's Affiliation Graduate School of Engineering, University of Tokyo()
2nd Author's Name Atsuhiro TAKASU
2nd Author's Affiliation R & D Department, NACSIS(National Center for Science Information Systems)
3rd Author's Name Jun ADACHI
3rd Author's Affiliation R & D Department, NACSIS(National Center for Science Information Systems)
Date 1998/5/13
Paper #
Volume (vol) vol.98
Number (no) 42
Page pp.pp.-
#Pages 8
Date of Issue