Presentation | 2013-01-30 A Preliminary Investigation on Improving Chinese Pinyin-to-Character Conversion Using MI Based Automatic Lexical Formation Jinsong ZHANG, Wei LI, Xiaoyun WANG, Masafumi NISHIDA, Seiichi YAMAMOTO, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Pinyin refers to the official phonological script of Putonghua Chinese. Pinyin-to-character (P2C) conversion means converting Pinyin to Chinese characters automatically, which is the most natural way to input Chinese characters into a computer through keyboard. Due to the fact that the mapping between Pinyin and characters is multiple versus multiple, the conversion is frequently accompanied with some errors in real applications. This paper presents a new idea to use the mutual information (MI) between text and its Pinyin to get a word segmentation of the training text corpus, then collect a lexicon and build an n-gram language model. After iteratively optimization, they can be applied torealizing a P2C conversion system. We developed a P2C system using newspaper corpus, and two other baseline systems for comparisonusing handcrafted lexicon and perplexity based optimized lexicon. All the three systems used bigram LMs. Preliminary experimental results showed that our system got relatively 19.7% and 10.3% error reductionsover the two baseline ones on testing corpus respectively. This proved the efficiency of our proposal. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Mutual information / Pinyin-to-Character Conversion / Language model |
Paper # | SP2012-98 |
Date of Issue |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2013/1/23(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Speech (SP) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A Preliminary Investigation on Improving Chinese Pinyin-to-Character Conversion Using MI Based Automatic Lexical Formation |
Sub Title (in English) | |
Keyword(1) | Mutual information |
Keyword(2) | Pinyin-to-Character Conversion |
Keyword(3) | Language model |
1st Author's Name | Jinsong ZHANG |
1st Author's Affiliation | Beijing Language and Culture University() |
2nd Author's Name | Wei LI |
2nd Author's Affiliation | NICT |
3rd Author's Name | Xiaoyun WANG |
3rd Author's Affiliation | Faculty of Science and Engineering, Doshisha University |
4th Author's Name | Masafumi NISHIDA |
4th Author's Affiliation | Faculty of Science and Engineering, Doshisha University |
5th Author's Name | Seiichi YAMAMOTO |
5th Author's Affiliation | Faculty of Science and Engineering, Doshisha University |
Date | 2013-01-30 |
Paper # | SP2012-98 |
Volume (vol) | vol.112 |
Number (no) | 422 |
Page | pp.pp.- |
#Pages | 5 |
Date of Issue |