Presentation 2000/7/11
Low-frequency Words in Bilingual Corpora : A Step towards Automatic Extraction of Bilingual Word Pairs
Keita Tsuji, Fuyuki Yoshikane, kyo Kageura,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The high-frequency bilingual word pairs in bilingual corpora are alrcady listed in the dictionaries. It is the low-frequency pairs that we have to extract. Based on that idea, we examine the method for automatically extracting bilingual word pairs from corpora and show that the statistical method, which has been studied intensively so far, is not suitable for the task. If two words J1 and J2 which belong to the same language always co-occur in the same alignments, the statistical method cannot determine which word is the correct translation of word E which belong to the other language. We saw many of the low-frequency words are in the above situation.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Low-frequency word / Bilingual word pair / Automatic extraction / Bilingual Corpora
Paper # NLC2000-16
Date of Issue

Conference Information
Committee NLC
Conference Date 2000/7/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Low-frequency Words in Bilingual Corpora : A Step towards Automatic Extraction of Bilingual Word Pairs
Sub Title (in English)
Keyword(1) Low-frequency word
Keyword(2) Bilingual word pair
Keyword(3) Automatic extraction
Keyword(4) Bilingual Corpora
1st Author's Name Keita Tsuji
1st Author's Affiliation Graduate School of Education, University of Tokyo()
2nd Author's Name Fuyuki Yoshikane
2nd Author's Affiliation Graduate School of Education, University of Tokyo
3rd Author's Name kyo Kageura
3rd Author's Affiliation National Institute of Informatics
Date 2000/7/11
Paper # NLC2000-16
Volume (vol) vol.100
Number (no) 200
Page pp.pp.-
#Pages 8
Date of Issue