Presentation | 2002/12/13 A study on language model based on kana and kanji string Hiroaki KINNO, Masaharu KATOH, Tetsuo KOSAKA, Masaki KOHDA, Akinori ITO, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | morphemic analysis / language model / character string / frequency / mutual information |
Paper # | NLC2002-71 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2002/12/13(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A study on language model based on kana and kanji string |
Sub Title (in English) | |
Keyword(1) | morphemic analysis |
Keyword(2) | language model |
Keyword(3) | character string |
Keyword(4) | frequency |
Keyword(5) | mutual information |
1st Author's Name | Hiroaki KINNO |
1st Author's Affiliation | Faculty of Engineering, Yamagata University() |
2nd Author's Name | Masaharu KATOH |
2nd Author's Affiliation | Faculty of Engineering, Yamagata University |
3rd Author's Name | Tetsuo KOSAKA |
3rd Author's Affiliation | Faculty of Engineering, Yamagata University |
4th Author's Name | Masaki KOHDA |
4th Author's Affiliation | Faculty of Engineering, Yamagata University |
5th Author's Name | Akinori ITO |
5th Author's Affiliation | Graduate School of Engineering, Tohoku University |
Date | 2002/12/13 |
Paper # | NLC2002-71 |
Volume (vol) | vol.102 |
Number (no) | 528 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |