Presentation 2004/2/12
Classifiers That Improve with Use (Thought and Language)
George Nagy,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Training on non-representative data causes any classifier to make many mistakes on new data. Retraining an OCR engine with labeled characters, obtained from routine post-editing, can reduce both the bias and the variance of the classifier, and therefore its error rate. In the absence of post-edits, the imperfect labels assigned by the classifier can be used instead. Although the theoretical foundations of decision-directed adaptation are meager, adaptation has proved successful in diverse experiments. When the operational data can be partitioned into isogenous subsets, the classifier parameters should be adapted independently on each subset. However, if the same-source subsets are small, as in postal-code or bank-check reading, it is advantageous to classify more than one character at a time. Style-constrained classification allows training the classifier on fields shorter than the classification field. Systematic methods still remain to be developed for adapting language context to the operational data stream, particularly for semi-structured business forms. Only dynamic classifiers can hope to rival human performance on imperfectly printed, written, copied, or scanned documents.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Dynamic classifier / Semi-supervised or unsupervised learning / Decision-directed adaptation / Self-correcting classifier / Style-constrained field classification / Weakly-constrained data / Non-representative training set / Language context
Paper # TL2003-42,PRMU2003-228
Date of Issue

Conference Information
Committee TL
Conference Date 2004/2/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Thought and Language (TL)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Classifiers That Improve with Use (Thought and Language)
Sub Title (in English)
Keyword(1) Dynamic classifier
Keyword(2) Semi-supervised or unsupervised learning
Keyword(3) Decision-directed adaptation
Keyword(4) Self-correcting classifier
Keyword(5) Style-constrained field classification
Keyword(6) Weakly-constrained data
Keyword(7) Non-representative training set
Keyword(8) Language context
1st Author's Name George Nagy
1st Author's Affiliation Rensselaer Polytechnic Institute()
Date 2004/2/12
Paper # TL2003-42,PRMU2003-228
Volume (vol) vol.103
Number (no) 656
Page pp.pp.-
#Pages 8
Date of Issue