Presentation 2012-12-19
Recognizing Variations of Japanese "Good Morning" Phrases in Twitter
Yoshinari Fujinuma, Hikaru Yokono, Pascual Martinez-Gomez, Akiko Aizawa,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recently, the rapid growth of Consumer Generated Media (CGM) such as Twitter introduced much expressive variations and informal representations into textual resources. Although word segmentation is the first step in most Japanese language applications, current word segmentation tools are not sufficiently adapted to such informal text yet. In this paper, we focus on a most frequent phrase expression in Japanese morning twitter, "おはようございます", and construct a CRF-based extractor of the variations. Using 500 manually annotated samples, we obtain F1 score of over 0.91 for both the head span ("おはよう") and the entire span (including the attachment part such as "ございます"). We also show that the extracted variations contain normalization pattern which are not defined in JUMAN 7.0.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Informal text / Rule extraction / Twitter / CRF
Paper # NLC2012-39
Date of Issue

Conference Information
Committee NLC
Conference Date 2012/12/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Recognizing Variations of Japanese "Good Morning" Phrases in Twitter
Sub Title (in English)
Keyword(1) Informal text
Keyword(2) Rule extraction
Keyword(3) Twitter
Keyword(4) CRF
1st Author's Name Yoshinari Fujinuma
1st Author's Affiliation The University of Tokyo()
2nd Author's Name Hikaru Yokono
2nd Author's Affiliation National Institute of Informatics
3rd Author's Name Pascual Martinez-Gomez
3rd Author's Affiliation The University of Tokyo/National Institute of Informatics
4th Author's Name Akiko Aizawa
4th Author's Affiliation The University of Tokyo/National Institute of Informatics
Date 2012-12-19
Paper # NLC2012-39
Volume (vol) vol.112
Number (no) 367
Page pp.pp.-
#Pages 6
Date of Issue