大会名称
2009年 情報科学技術フォーラム(FIT)
大会コ-ド
F
開催年
2009
発行日
2009/8/20
セッション番号
5E
セッション名
翻訳・多言語
講演日
2009/09/03
講演場所(会議室等)
E会場(9号館1F 915教室)
講演番号
E-016
タイトル
多言語話し言葉のコーパスの節単位認定 : 分割しにくい単位の分類
著者名
エフィーモワ ゾーヤ
キーワード
コーパス言語学, 話し言葉, 節単位認定, 多言語のコーパス
抄録
Transcribed and annotated multilingual corpora of spoken discourse are used nowadays as a powerful tool for language learning and various typological studies. One of the essential tasks of discourse annotation is parsing it into elementary discourse units (EDU) which are the minimal building parts of a coherent text, and it is important that principles of the parsing are the same for different languages. Usually an elementary unit is taken to be clause defined as a group of words containing a subject and a predicate. In this paper tries to reveal typical classes of non-prototypical EDUs that may contain no predicates or more than one predicate. A classification of such 'problematical' EDUs is made from semantical and pragmatical point of view which can help to build universal parsing rules for different languages.
本文pdf
PDF download (256.4KB)