使えば使うほど賢くなる識別器(文字とドキュメントの認識・理解)

Presentation	2004/2/12 Classifiers That Improve with Use (Thought and Language) George Nagy,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Training on non-representative data causes any classifier to make many mistakes on new data. Retraining an OCR engine with labeled characters, obtained from routine post-editing, can reduce both the bias and the variance of the classifier, and therefore its error rate. In the absence of post-edits, the imperfect labels assigned by the classifier can be used instead. Although the theoretical foundations of decision-directed adaptation are meager, adaptation has proved successful in diverse experiments. When the operational data can be partitioned into isogenous subsets, the classifier parameters should be adapted independently on each subset. However, if the same-source subsets are small, as in postal-code or bank-check reading, it is advantageous to classify more than one character at a time. Style-constrained classification allows training the classifier on fields shorter than the classification field. Systematic methods still remain to be developed for adapting language context to the operational data stream, particularly for semi-structured business forms. Only dynamic classifiers can hope to rival human performance on imperfectly printed, written, copied, or scanned documents.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Dynamic classifier / Semi-supervised or unsupervised learning / Decision-directed adaptation / Self-correcting classifier / Style-constrained field classification / Weakly-constrained data / Non-representative training set / Language context
Paper #	TL2003-42,PRMU2003-228
Date of Issue

Conference Information
Committee	TL
Conference Date	2004/2/12(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Thought and Language (TL)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Classifiers That Improve with Use (Thought and Language)
Sub Title (in English)
Keyword(1)	Dynamic classifier
Keyword(2)	Semi-supervised or unsupervised learning
Keyword(3)	Decision-directed adaptation
Keyword(4)	Self-correcting classifier
Keyword(5)	Style-constrained field classification
Keyword(6)	Weakly-constrained data
Keyword(7)	Non-representative training set
Keyword(8)	Language context
1st Author's Name	George Nagy
1st Author's Affiliation	Rensselaer Polytechnic Institute()
Date	2004/2/12
Paper #	TL2003-42,PRMU2003-228
Volume (vol)	vol.103
Number (no)	656
Page	pp.pp.-
#Pages	8
Date of Issue