統計的手法による単語の切出しについて

Presentation	1995/12/15 Automatic Acquisition of Words by Using Statistical Text Information Hidekazu NAKAWATASE,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper describes a new method to acquire words automatically from a Japanese text. Morphological analysis is necessary to recognize words from a Japanese text. There exist, however, problems of unknown words recognition and ambiguity of compound words recognition, so dictionaries and complex heuristics are necessary to resolve them. This method is based on the N-gram method which need not traditional morphological analysis. It includes 2 steps: (1) calculation of Normalized Frequency for each substring included in the Japanese text using the N-gram statistics, (2) acquisition of the boundary between words. Experiments were done on EDR Japanese corpus. we obtained the correct recognition scores of 82.39% and the recall scores of 69.84%.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	word extraction / N-gram / natural language analysis / morphological analysis / statistics
Paper #	NLC95-68
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Automatic Acquisition of Words by Using Statistical Text Information
Sub Title (in English)
Keyword(1)	word extraction
Keyword(2)	N-gram
Keyword(3)	natural language analysis
Keyword(4)	morphological analysis
Keyword(5)	statistics
1st Author's Name	Hidekazu NAKAWATASE
1st Author's Affiliation	NTT Information and Communication Systems Laboratories()
Date	1995/12/15
Paper #	NLC95-68
Volume (vol)	vol.95
Number (no)	429
Page	pp.pp.-
#Pages	6
Date of Issue