Presentation 2008/7/10
Construction of Japanese Idiom Corpus and its Application to Japanese Idiom Identification
Chikara HASHIMOTO, Daisuke KAWAHARA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Some phrases can be interpreted as either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged NLP. To this end, we have been constructing a Japanese idiom corpus that we hope provides a solution. This paper reports on the current status of the corpus and the result of Japanese idiom identification experiment using the corpus. The corpus targets 146 ambiguous idioms, and consists of 113,460 sentences, each of which is annotated with a literal/idiom label. The sentences have all been collected from the Web. As for Japanese idiom identification, we adopted a word sense disambiguation method, and targeted those 93 idioms for which more than 50 sentences for both literal and idiomatic usages were available. As a result, our system showed a performance that seemed equally well or better than that reported earlier on English idiom identification.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Japanese idiom / corpus / idiom identification / language resources
Paper # NLC2008-1
Date of Issue

Conference Information
Committee NLC
Conference Date 2008/7/10(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Construction of Japanese Idiom Corpus and its Application to Japanese Idiom Identification
Sub Title (in English)
Keyword(1) Japanese idiom
Keyword(2) corpus
Keyword(3) idiom identification
Keyword(4) language resources
1st Author's Name Chikara HASHIMOTO
1st Author's Affiliation Graduate School of Science and Engineering, Yamagata University()
2nd Author's Name Daisuke KAWAHARA
2nd Author's Affiliation National Institute of Information and Communications Technology
Date 2008/7/10
Paper # NLC2008-1
Volume (vol) vol.108
Number (no) 141
Page pp.pp.-
#Pages 6
Date of Issue