Presentation 2002/12/13
A study on language model based on kana and kanji string
Hiroaki KINNO, Masaharu KATOH, Tetsuo KOSAKA, Masaki KOHDA, Akinori ITO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) morphemic analysis / language model / character string / frequency / mutual information
Paper # NLC2002-71
Date of Issue

Conference Information
Committee NLC
Conference Date 2002/12/13(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A study on language model based on kana and kanji string
Sub Title (in English)
Keyword(1) morphemic analysis
Keyword(2) language model
Keyword(3) character string
Keyword(4) frequency
Keyword(5) mutual information
1st Author's Name Hiroaki KINNO
1st Author's Affiliation Faculty of Engineering, Yamagata University()
2nd Author's Name Masaharu KATOH
2nd Author's Affiliation Faculty of Engineering, Yamagata University
3rd Author's Name Tetsuo KOSAKA
3rd Author's Affiliation Faculty of Engineering, Yamagata University
4th Author's Name Masaki KOHDA
4th Author's Affiliation Faculty of Engineering, Yamagata University
5th Author's Name Akinori ITO
5th Author's Affiliation Graduate School of Engineering, Tohoku University
Date 2002/12/13
Paper # NLC2002-71
Volume (vol) vol.102
Number (no) 528
Page pp.pp.-
#Pages 6
Date of Issue