Presentation 2004/11/27
Learning Transfer Rules from Annotated English-Vietnamese Bilingual Corpus(Text Mining I)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining)
Dinh Dien, Hoang Kiem,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Due to the difference of language typology, many transfer rules are required in the lexical and structural transfer stage in the English-to-Vietnamese Machine Translation. Recently, many NLP (Natural Language Processing) tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we present issues of collection and annotation (Word Alignment, Word Segmentation Vietnamese and Part-of-Speech) of a parallel corpus of English-Vietnamese named EVC (English-Vietnamese Corpus). From this EVC, transfer rules have been automatically mined to train for Vietnamese-related NLP tasks and to study English - Vietnamese comparative linguistics.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Corpora / bilingual corpus / corpus annotation / text mining
Paper # AI2004-24
Date of Issue

Conference Information
Committee AI
Conference Date 2004/11/27(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Artificial Intelligence and Knowledge-Based Processing (AI)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Learning Transfer Rules from Annotated English-Vietnamese Bilingual Corpus(Text Mining I)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining)
Sub Title (in English)
Keyword(1) Corpora
Keyword(2) bilingual corpus
Keyword(3) corpus annotation
Keyword(4) text mining
1st Author's Name Dinh Dien
1st Author's Affiliation Faculty of Information Technology, University of Natural Sciences, VNU-HCMC()
2nd Author's Name Hoang Kiem
2nd Author's Affiliation Center of Information Technology Development, Vietnam National University of HCMC
Date 2004/11/27
Paper # AI2004-24
Volume (vol) vol.104
Number (no) 485
Page pp.pp.-
#Pages 6
Date of Issue