Presentation 1996/7/18
Automatic Extraction of Translation Patterns in Pararell Corpora
Mihoko Kitamura, Yuji Matsumoto,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a method of finding correspondences of arbitrary length word sequences in aligned parallel corpora of Japanese and English. Translation candidates of word sequences are evaluated by a similarity measure between the sequences defined by the co-occurrence frequency and independent frequency of the word sequences. The similarity measure is an extension of Dice coefficient. An iterative method with gradual threshold lowering is proposed for getting a high quality translation dictionary. The method is tested with parallel corpora of three distinct domains and achieved over 80% accuracy.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) statistical based NLP / parallel corpus / machine translation / translation patterns / word similarity
Paper # NLC96-19
Date of Issue

Conference Information
Committee NLC
Conference Date 1996/7/18(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Automatic Extraction of Translation Patterns in Pararell Corpora
Sub Title (in English)
Keyword(1) statistical based NLP
Keyword(2) parallel corpus
Keyword(3) machine translation
Keyword(4) translation patterns
Keyword(5) word similarity
1st Author's Name Mihoko Kitamura
1st Author's Affiliation Oki Electric Industory Co., Ltd. Kansai Laboratory, Reseach & Development Group()
2nd Author's Name Yuji Matsumoto
2nd Author's Affiliation Nara Institute of Science and Technology Graduate School of Information Science
Date 1996/7/18
Paper # NLC96-19
Volume (vol) vol.96
Number (no) 157
Page pp.pp.-
#Pages 8
Date of Issue