Presentation | 1998/5/13 Reduction of Expanded Search Terms for Fuzzy English-text Retrieval Manabu OHTA, Atsuhiro TAKASU, Jun ADACHI, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | OSR misrecognition is a serious problem where OCR-recognized text is used for retrieval purpose in digital libraries. We have proposed fuzzy retrieval methods which assume that errors remain in the recognized text, without correcting errors manually from a cost standpoint. The proposed methods generate multiple search terms for an input query term by referring to the confusion matrices which store all characters likely to be misrecognized and the respective probability of each misrecognition. The proposed methods can improve recall rate without decreasing precision rate but occasionally generate a few million search terms in English fuzzy retrieval, which is a bottleneck for retrieval speed. Therefore this paper presents a method to reduce the number of the generated search terms with keeping sufficient retrieval effectiveness by restricting the number of errors included in the expanded search terms. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | fuzzy retrieval / query term expansion / retrieval speed / confusion matrix / OCR |
Paper # | |
Date of Issue |
Conference Information | |
Committee | DE |
---|---|
Conference Date | 1998/5/13(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Data Engineering (DE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Reduction of Expanded Search Terms for Fuzzy English-text Retrieval |
Sub Title (in English) | |
Keyword(1) | fuzzy retrieval |
Keyword(2) | query term expansion |
Keyword(3) | retrieval speed |
Keyword(4) | confusion matrix |
Keyword(5) | OCR |
1st Author's Name | Manabu OHTA |
1st Author's Affiliation | Graduate School of Engineering, University of Tokyo() |
2nd Author's Name | Atsuhiro TAKASU |
2nd Author's Affiliation | R & D Department, NACSIS(National Center for Science Information Systems) |
3rd Author's Name | Jun ADACHI |
3rd Author's Affiliation | R & D Department, NACSIS(National Center for Science Information Systems) |
Date | 1998/5/13 |
Paper # | |
Volume (vol) | vol.98 |
Number (no) | 42 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |