Presentation | 2004/2/12 Classifiers That Improve with Use (Thought and Language) George Nagy, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Training on non-representative data causes any classifier to make many mistakes on new data. Retraining an OCR engine with labeled characters, obtained from routine post-editing, can reduce both the bias and the variance of the classifier, and therefore its error rate. In the absence of post-edits, the imperfect labels assigned by the classifier can be used instead. Although the theoretical foundations of decision-directed adaptation are meager, adaptation has proved successful in diverse experiments. When the operational data can be partitioned into isogenous subsets, the classifier parameters should be adapted independently on each subset. However, if the same-source subsets are small, as in postal-code or bank-check reading, it is advantageous to classify more than one character at a time. Style-constrained classification allows training the classifier on fields shorter than the classification field. Systematic methods still remain to be developed for adapting language context to the operational data stream, particularly for semi-structured business forms. Only dynamic classifiers can hope to rival human performance on imperfectly printed, written, copied, or scanned documents. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Dynamic classifier / Semi-supervised or unsupervised learning / Decision-directed adaptation / Self-correcting classifier / Style-constrained field classification / Weakly-constrained data / Non-representative training set / Language context |
Paper # | TL2003-42,PRMU2003-228 |
Date of Issue |
Conference Information | |
Committee | TL |
---|---|
Conference Date | 2004/2/12(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Thought and Language (TL) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Classifiers That Improve with Use (Thought and Language) |
Sub Title (in English) | |
Keyword(1) | Dynamic classifier |
Keyword(2) | Semi-supervised or unsupervised learning |
Keyword(3) | Decision-directed adaptation |
Keyword(4) | Self-correcting classifier |
Keyword(5) | Style-constrained field classification |
Keyword(6) | Weakly-constrained data |
Keyword(7) | Non-representative training set |
Keyword(8) | Language context |
1st Author's Name | George Nagy |
1st Author's Affiliation | Rensselaer Polytechnic Institute() |
Date | 2004/2/12 |
Paper # | TL2003-42,PRMU2003-228 |
Volume (vol) | vol.103 |
Number (no) | 656 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |