Presentation 1998/12/10
MULTI CLASS COMPOSITE N-GRAM LANGUAGE MODEL BASED ON CONNECTION DIRECTION
Hirofumi Yamamoto, Yoshinori Sagisaka,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A new word-clustering rechnique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple(two-dimensional)word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller logical parameter size than conventional word 2-grams.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Class N-gram / Variable Order N-gram / Automatic Clustering / Joined Word
Paper # NLC98-38,SP98-102
Date of Issue

Conference Information
Committee NLC
Conference Date 1998/12/10(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) MULTI CLASS COMPOSITE N-GRAM LANGUAGE MODEL BASED ON CONNECTION DIRECTION
Sub Title (in English)
Keyword(1) Class N-gram
Keyword(2) Variable Order N-gram
Keyword(3) Automatic Clustering
Keyword(4) Joined Word
1st Author's Name Hirofumi Yamamoto
1st Author's Affiliation ATR Interpreting Telecommunications Res.Labs.()
2nd Author's Name Yoshinori Sagisaka
2nd Author's Affiliation ATR Interpreting Telecommunications Res.Labs.
Date 1998/12/10
Paper # NLC98-38,SP98-102
Volume (vol) vol.98
Number (no) 460
Page pp.pp.-
#Pages 6
Date of Issue