上層句情報の利用と学習データの選別による母語推定の精度向上(新たな試み・思考,第6回集合知シンポジウム)

Presentation	2014/12/9 Improved Native Language Identification with Upper Phrase Information and Training Data Selection MASAHIRO TANAKA, LAN WANG, HAYATO YAMANA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Native Language Identification, the task of identifying the native language (L1) of a writer based solely on a sample of his/her writing in non-native language (L2), is one of the authorship attribution problem. In this paper, we propose i) "upper phrase information" as a new feature, ii) discarding essay data which seem to be outliers from the training dataset. NLI is able to applicable to many other NLP tasks such as Second Language Acquisition. From 2005, many researchers have approached this task in different ways. Combining all the proposed techniques and existing methods, our system archives 85.6% accuracy on the NLI Shared Task 2014 data. To the best of our knowledge, this is a state-of-the-art accuracy in the NLI tasks.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)
Paper #	Vol.2014-NL-219 No.21
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Improved Native Language Identification with Upper Phrase Information and Training Data Selection
Sub Title (in English)
Keyword(1)
1st Author's Name	MASAHIRO TANAKA
1st Author's Affiliation	()
2nd Author's Name	LAN WANG
2nd Author's Affiliation
3rd Author's Name	HAYATO YAMANA
3rd Author's Affiliation
Date	2014/12/9
Paper #	Vol.2014-NL-219 No.21
Volume (vol)	vol.114
Number (no)	366
Page	pp.pp.-
#Pages	6
Date of Issue