Presentation | 2001/3/16 Text alignment using the statistical technique and the language feature Yasumichi Zaima, Ryoko Yasukawa, Fuji Ren, Teruaki Aizawa, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Translation corpora are used in machine translation widely. However, most bilingual texts are not matched in sentences, although matched as the whole text. This paper describes the method for the text alignment (sentence matching). The method uses statistical information (the occurrence probability and the number ratio of characters of a correspondence group pattern) and the language feature. A sentence is matched by calculating the evaluation value of each correspondence group by the occurrence probability and the number ratio of characters of a correspondence group pattern, and asking for the best correspondence using the dynamic programming method. Our research aims both Japanese-English and Japanese-Chinese. The language feature for Japanese-English is about Japanese syllabary. Moreover, for Japanese-Chinese, a sentence is matched in consideration of the number of coincidence of a Chinese character. A prototype system based on the proposed method has been built and experiments on both Japanese-English and Japanese-Chinese have been carried out. The results show the accuracy for Japanese-English was 75%, and for Japanese-Chinese was 60%, respectively. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | bilingual text / the number of characters / correspondence group pattern / occurrence probability / text alignment / the language / feature |
Paper # | TL2000-40,NLC2000-75 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2001/3/16(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Text alignment using the statistical technique and the language feature |
Sub Title (in English) | |
Keyword(1) | bilingual text |
Keyword(2) | the number of characters |
Keyword(3) | correspondence group pattern |
Keyword(4) | occurrence probability |
Keyword(5) | text alignment |
Keyword(6) | the language |
Keyword(7) | feature |
1st Author's Name | Yasumichi Zaima |
1st Author's Affiliation | Hiroshima City University() |
2nd Author's Name | Ryoko Yasukawa |
2nd Author's Affiliation | Hiroshima City University |
3rd Author's Name | Fuji Ren |
3rd Author's Affiliation | Hiroshima City University |
4th Author's Name | Teruaki Aizawa |
4th Author's Affiliation | Hiroshima City University |
Date | 2001/3/16 |
Paper # | TL2000-40,NLC2000-75 |
Volume (vol) | vol.100 |
Number (no) | 700 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |