Presentation 1997/7/25
Automatic Alignment of Bilingual Corpora using Multi-lingual Information Retrieval
Nigel Collier, Akira Kumano, Hideki Hirakawa,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper we present an adaptation of multi-lingual information retrieval for the production of an aligned bilingual corpus from noisy parallel English-Japanese newswire articles. We implement the standard vector space model and show though simulation the effectiveness of six variations for the alignment task. The methods are computationally efficient, easy to evaluate and generalizable to other genres and language pairs - an important factor if we are to use the aligned articles for knowledge acquisition in unrestricted domains. Our results indicate that while stemming, inverse document frequency and lexical filtering all improve the performance, the best overall improvement was due to simple normalization of article length.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) alignment / corpus / MLIR / knowledge acquisition
Paper # NLC97-24
Date of Issue

Conference Information
Committee NLC
Conference Date 1997/7/25(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Automatic Alignment of Bilingual Corpora using Multi-lingual Information Retrieval
Sub Title (in English)
Keyword(1) alignment
Keyword(2) corpus
Keyword(3) MLIR
Keyword(4) knowledge acquisition
1st Author's Name Nigel Collier
1st Author's Affiliation Research and Development Center Toshiba Corporation()
2nd Author's Name Akira Kumano
2nd Author's Affiliation Research and Development Center Toshiba Corporation
3rd Author's Name Hideki Hirakawa
3rd Author's Affiliation Research and Development Center Toshiba Corporation
Date 1997/7/25
Paper # NLC97-24
Volume (vol) vol.97
Number (no) 200
Page pp.pp.-
#Pages 8
Date of Issue