タグなしコーパスによる形態素解析と仮名漢字変換の精度向上

Presentation	2001/7/9 Improvement of POS tagger and KanaKanji Converter by an Untagged Corpus Shinsuke Mori, Nobuyasu Itoh,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	A tagged corpus plays an importantrole in natural language processing based on a stochastic language model and increasing the corpus size improves the accuracy. It is, however, necessary for a meaningful improvement to increase a corpus size more than expornentially and an annotation cost needed for it is not negligiable. In this paper, we discuss the usage of an untagged corpus. In the experiments, using an untagged corpus improved the predictive power of a stochastic language model and the accuracy of a kana-kanji converter based on it. But for a tagger the improvement was slight.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Kana-kanji converter / Stochastic Language Model / Corpus / Morphological analysis / Untagged
Paper #	NLC2001-15
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Improvement of POS tagger and KanaKanji Converter by an Untagged Corpus
Sub Title (in English)
Keyword(1)	Kana-kanji converter
Keyword(2)	Stochastic Language Model
Keyword(3)	Corpus
Keyword(4)	Morphological analysis
Keyword(5)	Untagged
1st Author's Name	Shinsuke Mori
1st Author's Affiliation	Tokyo Research Laboratory, IBM Japan()
2nd Author's Name	Nobuyasu Itoh
2nd Author's Affiliation	Tokyo Research Laboratory, IBM Japan
Date	2001/7/9
Paper #	NLC2001-15
Volume (vol)	vol.101
Number (no)	189
Page	pp.pp.-
#Pages	8
Date of Issue