Presentation 2014/12/9
Improved Native Language Identification with Upper Phrase Information and Training Data Selection
MASAHIRO TANAKA, LAN WANG, HAYATO YAMANA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Native Language Identification, the task of identifying the native language (L1) of a writer based solely on a sample of his/her writing in non-native language (L2), is one of the authorship attribution problem. In this paper, we propose i) "upper phrase information" as a new feature, ii) discarding essay data which seem to be outliers from the training dataset. NLI is able to applicable to many other NLP tasks such as Second Language Acquisition. From 2005, many researchers have approached this task in different ways. Combining all the proposed techniques and existing methods, our system archives 85.6% accuracy on the NLI Shared Task 2014 data. To the best of our knowledge, this is a state-of-the-art accuracy in the NLI tasks.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # Vol.2014-NL-219 No.21
Date of Issue

Conference Information
Committee NLC
Conference Date 2014/12/9(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Improved Native Language Identification with Upper Phrase Information and Training Data Selection
Sub Title (in English)
Keyword(1)
1st Author's Name MASAHIRO TANAKA
1st Author's Affiliation ()
2nd Author's Name LAN WANG
2nd Author's Affiliation
3rd Author's Name HAYATO YAMANA
3rd Author's Affiliation
Date 2014/12/9
Paper # Vol.2014-NL-219 No.21
Volume (vol) vol.114
Number (no) 366
Page pp.pp.-
#Pages 6
Date of Issue