Presentation 1995/12/15
Automatic Acquisition of Words by Using Statistical Text Information
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes a new method to acquire words automatically from a Japanese text. Morphological analysis is necessary to recognize words from a Japanese text. There exist, however, problems of unknown words recognition and ambiguity of compound words recognition, so dictionaries and complex heuristics are necessary to resolve them. This method is based on the N-gram method which need not traditional morphological analysis. It includes 2 steps: (1) calculation of Normalized Frequency for each substring included in the Japanese text using the N-gram statistics, (2) acquisition of the boundary between words. Experiments were done on EDR Japanese corpus. we obtained the correct recognition scores of 82.39% and the recall scores of 69.84%.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) word extraction / N-gram / natural language analysis / morphological analysis / statistics
Paper # NLC95-68
Date of Issue

Conference Information
Committee NLC
Conference Date 1995/12/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Vice Chair

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Automatic Acquisition of Words by Using Statistical Text Information
Sub Title (in English)
Keyword(1) word extraction
Keyword(2) N-gram
Keyword(3) natural language analysis
Keyword(4) morphological analysis
Keyword(5) statistics
1st Author's Name Hidekazu NAKAWATASE
1st Author's Affiliation NTT Information and Communication Systems Laboratories()
Date 1995/12/15
Paper # NLC95-68
Volume (vol) vol.95
Number (no) 429
Page pp.pp.-
#Pages 6
Date of Issue