Presentation 2013-01-30
A Preliminary Investigation on Improving Chinese Pinyin-to-Character Conversion Using MI Based Automatic Lexical Formation
Jinsong ZHANG, Wei LI, Xiaoyun WANG, Masafumi NISHIDA, Seiichi YAMAMOTO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Pinyin refers to the official phonological script of Putonghua Chinese. Pinyin-to-character (P2C) conversion means converting Pinyin to Chinese characters automatically, which is the most natural way to input Chinese characters into a computer through keyboard. Due to the fact that the mapping between Pinyin and characters is multiple versus multiple, the conversion is frequently accompanied with some errors in real applications. This paper presents a new idea to use the mutual information (MI) between text and its Pinyin to get a word segmentation of the training text corpus, then collect a lexicon and build an n-gram language model. After iteratively optimization, they can be applied torealizing a P2C conversion system. We developed a P2C system using newspaper corpus, and two other baseline systems for comparisonusing handcrafted lexicon and perplexity based optimized lexicon. All the three systems used bigram LMs. Preliminary experimental results showed that our system got relatively 19.7% and 10.3% error reductionsover the two baseline ones on testing corpus respectively. This proved the efficiency of our proposal.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Mutual information / Pinyin-to-Character Conversion / Language model
Paper # SP2012-98
Date of Issue

Conference Information
Committee SP
Conference Date 2013/1/23(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Preliminary Investigation on Improving Chinese Pinyin-to-Character Conversion Using MI Based Automatic Lexical Formation
Sub Title (in English)
Keyword(1) Mutual information
Keyword(2) Pinyin-to-Character Conversion
Keyword(3) Language model
1st Author's Name Jinsong ZHANG
1st Author's Affiliation Beijing Language and Culture University()
2nd Author's Name Wei LI
2nd Author's Affiliation NICT
3rd Author's Name Xiaoyun WANG
3rd Author's Affiliation Faculty of Science and Engineering, Doshisha University
4th Author's Name Masafumi NISHIDA
4th Author's Affiliation Faculty of Science and Engineering, Doshisha University
5th Author's Name Seiichi YAMAMOTO
5th Author's Affiliation Faculty of Science and Engineering, Doshisha University
Date 2013-01-30
Paper # SP2012-98
Volume (vol) vol.112
Number (no) 422
Page pp.pp.-
#Pages 5
Date of Issue