Presentation | 2004/11/27 Learning Transfer Rules from Annotated English-Vietnamese Bilingual Corpus(Text Mining I)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining) Dinh Dien, Hoang Kiem, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Due to the difference of language typology, many transfer rules are required in the lexical and structural transfer stage in the English-to-Vietnamese Machine Translation. Recently, many NLP (Natural Language Processing) tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we present issues of collection and annotation (Word Alignment, Word Segmentation Vietnamese and Part-of-Speech) of a parallel corpus of English-Vietnamese named EVC (English-Vietnamese Corpus). From this EVC, transfer rules have been automatically mined to train for Vietnamese-related NLP tasks and to study English - Vietnamese comparative linguistics. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Corpora / bilingual corpus / corpus annotation / text mining |
Paper # | AI2004-24 |
Date of Issue |
Conference Information | |
Committee | AI |
---|---|
Conference Date | 2004/11/27(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Artificial Intelligence and Knowledge-Based Processing (AI) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Learning Transfer Rules from Annotated English-Vietnamese Bilingual Corpus(Text Mining I)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining) |
Sub Title (in English) | |
Keyword(1) | Corpora |
Keyword(2) | bilingual corpus |
Keyword(3) | corpus annotation |
Keyword(4) | text mining |
1st Author's Name | Dinh Dien |
1st Author's Affiliation | Faculty of Information Technology, University of Natural Sciences, VNU-HCMC() |
2nd Author's Name | Hoang Kiem |
2nd Author's Affiliation | Center of Information Technology Development, Vietnam National University of HCMC |
Date | 2004/11/27 |
Paper # | AI2004-24 |
Volume (vol) | vol.104 |
Number (no) | 485 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |