Presentation 2001/3/16
Text alignment using the statistical technique and the language feature
Yasumichi Zaima, Ryoko Yasukawa, Fuji Ren, Teruaki Aizawa,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Translation corpora are used in machine translation widely. However, most bilingual texts are not matched in sentences, although matched as the whole text. This paper describes the method for the text alignment (sentence matching). The method uses statistical information (the occurrence probability and the number ratio of characters of a correspondence group pattern) and the language feature. A sentence is matched by calculating the evaluation value of each correspondence group by the occurrence probability and the number ratio of characters of a correspondence group pattern, and asking for the best correspondence using the dynamic programming method. Our research aims both Japanese-English and Japanese-Chinese. The language feature for Japanese-English is about Japanese syllabary. Moreover, for Japanese-Chinese, a sentence is matched in consideration of the number of coincidence of a Chinese character. A prototype system based on the proposed method has been built and experiments on both Japanese-English and Japanese-Chinese have been carried out. The results show the accuracy for Japanese-English was 75%, and for Japanese-Chinese was 60%, respectively.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) bilingual text / the number of characters / correspondence group pattern / occurrence probability / text alignment / the language / feature
Paper # TL2000-40,NLC2000-75
Date of Issue

Conference Information
Committee NLC
Conference Date 2001/3/16(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Text alignment using the statistical technique and the language feature
Sub Title (in English)
Keyword(1) bilingual text
Keyword(2) the number of characters
Keyword(3) correspondence group pattern
Keyword(4) occurrence probability
Keyword(5) text alignment
Keyword(6) the language
Keyword(7) feature
1st Author's Name Yasumichi Zaima
1st Author's Affiliation Hiroshima City University()
2nd Author's Name Ryoko Yasukawa
2nd Author's Affiliation Hiroshima City University
3rd Author's Name Fuji Ren
3rd Author's Affiliation Hiroshima City University
4th Author's Name Teruaki Aizawa
4th Author's Affiliation Hiroshima City University
Date 2001/3/16
Paper # TL2000-40,NLC2000-75
Volume (vol) vol.100
Number (no) 700
Page pp.pp.-
#Pages 8
Date of Issue