Presentation 2016-06-04
Identification of Tweets that Mention Books
Shuntaro Yada, Kyo Kageura,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We report performances of a classifier that identify Tweets that Mention Books (TMB) from tweets that contain the same strings as book titles in Japanese. The classifier we developped performed reasonably good in terms of F1-measure (about 0.7) with the combination of Maximum Entropy Modelling and a Bag-of-Words based feature set. In this paper, in order to improve our classifier, we analyse effects to classification performance, of (1) training data augmentation using a simple search based method with book/reading related keywords, and of (2) feature dimension reduction via Latent Semantic Analysis (LSA). In addition, we compare our classifier to Maltilayer Perceptron activated by Sigmoid function in terms of feature dimension reduction on a trial basis.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Twitter / Named Entity Recognition / Classification / Logistic Regression / Maximum Entropy Modelling / Multilayer Perceptron
Paper # TL2016-7,NLC2016-7
Date of Issue 2016-05-28 (TL, NLC)

Conference Information
Committee NLC / TL
Conference Date 2016/6/4(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Otaru University of Commerce
Topics (in Japanese) (See Japanese page)
Topics (in English) Application of natural language proessing and linguistic analysis, and general topic of NLP
Chair Hiroshi Kanayama(IBM) / Masami Suzuki(KDDI R&D Labs.)
Vice Chair Makoto Ichise(NTT DoCoMo) / Takeshi Sakaki(Univ. of Tokyo/Hottolink) / Chiaki Kubomura(Yamano College of Aesthetics)
Secretary Makoto Ichise(Ryukoku Univ.) / Takeshi Sakaki(Kyushu Inst. of Tech.) / Chiaki Kubomura(Ehime Univ.)
Assistant Ryuichiro Higashinaka(NTT) / Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Yasushi Tsubota(Kyoto Univ.) / Nobuyuki Jincho(Waseda Univ.)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Technical Committee on Thought and Language
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Identification of Tweets that Mention Books
Sub Title (in English) Effects of Features, Data Size, and ML Algorithms
Keyword(1) Twitter
Keyword(2) Named Entity Recognition
Keyword(3) Classification
Keyword(4) Logistic Regression
Keyword(5) Maximum Entropy Modelling
Keyword(6) Multilayer Perceptron
1st Author's Name Shuntaro Yada
1st Author's Affiliation The University of Tokyo(UTokyo)
2nd Author's Name Kyo Kageura
2nd Author's Affiliation The University of Tokyo(UTokyo)
Date 2016-06-04
Paper # TL2016-7,NLC2016-7
Volume (vol) vol.116
Number (no) TL-77,NLC-78
Page pp.pp.29-34(TL), pp.29-34(NLC),
#Pages 6
Date of Issue 2016-05-28 (TL, NLC)