大会名称 |
---|
2010年 情報科学技術フォーラム(FIT) |
大会コ-ド |
F |
開催年 |
2010 |
発行日 |
2010/8/20 |
セッション番号 |
1G |
セッション名 |
言語解析 |
講演日 |
2010/09/07 |
講演場所(会議室等) |
G会場(総合学習プラザ1F 第11講義室) |
講演番号 |
E-007 |
タイトル |
A Rule-based Approach for Khmer Word Extraction |
著者名 |
Van Channa, Kameyama Wataru, |
キーワード |
Khmer, Word Extraction, Rule-based Approach |
抄録 |
This paper presents a trainable rule-based approach to extract Khmer words from the text. A rule set is created by the rule training process based on a Khmer text corpus. The word longest matching algorithm and the SEQUITUR algorithm are applied to detect and extract the rules of the frequent co-occurrence strings found the corpus. The entropy of the rules and the mutual information of each string in the rules are calculated and they are used to determine the strength of each rule to be a word. The obtained rule set is used to extract the words from the text. The precision and recall of the proposed approach are 89.37% and 95.50%, respectively. |
本文pdf |
PDF download (199.5KB) |