Presentation 2001/7/9
Improvement of POS tagger and KanaKanji Converter by an Untagged Corpus
Shinsuke Mori, Nobuyasu Itoh,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A tagged corpus plays an importantrole in natural language processing based on a stochastic language model and increasing the corpus size improves the accuracy. It is, however, necessary for a meaningful improvement to increase a corpus size more than expornentially and an annotation cost needed for it is not negligiable. In this paper, we discuss the usage of an untagged corpus. In the experiments, using an untagged corpus improved the predictive power of a stochastic language model and the accuracy of a kana-kanji converter based on it. But for a tagger the improvement was slight.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Kana-kanji converter / Stochastic Language Model / Corpus / Morphological analysis / Untagged
Paper # NLC2001-15
Date of Issue

Conference Information
Committee NLC
Conference Date 2001/7/9(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Improvement of POS tagger and KanaKanji Converter by an Untagged Corpus
Sub Title (in English)
Keyword(1) Kana-kanji converter
Keyword(2) Stochastic Language Model
Keyword(3) Corpus
Keyword(4) Morphological analysis
Keyword(5) Untagged
1st Author's Name Shinsuke Mori
1st Author's Affiliation Tokyo Research Laboratory, IBM Japan()
2nd Author's Name Nobuyasu Itoh
2nd Author's Affiliation Tokyo Research Laboratory, IBM Japan
Date 2001/7/9
Paper # NLC2001-15
Volume (vol) vol.101
Number (no) 189
Page pp.pp.-
#Pages 8
Date of Issue