Presentation | 1997/7/25 Automatic Alignment of Bilingual Corpora using Multi-lingual Information Retrieval Nigel Collier, Akira Kumano, Hideki Hirakawa, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper we present an adaptation of multi-lingual information retrieval for the production of an aligned bilingual corpus from noisy parallel English-Japanese newswire articles. We implement the standard vector space model and show though simulation the effectiveness of six variations for the alignment task. The methods are computationally efficient, easy to evaluate and generalizable to other genres and language pairs - an important factor if we are to use the aligned articles for knowledge acquisition in unrestricted domains. Our results indicate that while stemming, inverse document frequency and lexical filtering all improve the performance, the best overall improvement was due to simple normalization of article length. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | alignment / corpus / MLIR / knowledge acquisition |
Paper # | NLC97-24 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 1997/7/25(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Automatic Alignment of Bilingual Corpora using Multi-lingual Information Retrieval |
Sub Title (in English) | |
Keyword(1) | alignment |
Keyword(2) | corpus |
Keyword(3) | MLIR |
Keyword(4) | knowledge acquisition |
1st Author's Name | Nigel Collier |
1st Author's Affiliation | Research and Development Center Toshiba Corporation() |
2nd Author's Name | Akira Kumano |
2nd Author's Affiliation | Research and Development Center Toshiba Corporation |
3rd Author's Name | Hideki Hirakawa |
3rd Author's Affiliation | Research and Development Center Toshiba Corporation |
Date | 1997/7/25 |
Paper # | NLC97-24 |
Volume (vol) | vol.97 |
Number (no) | 200 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |