Presentation 2005/3/7
Topic Segmentation Using Kernel Pricipal Component Analysis for Sub-Phonetic Segments
Ken SADOHARA, Shi-wook LEE, Hiroaki KOJIMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes an open-vocabulary method for segmenting spoken documents into topically homogeneous blocks. Without transcribing the spoken documents into texts, the method builds the topical clusters directly from recognized sub-phonetic segments, and thus it is not constrained in term of vocabulary or grammar. Each analysis interval constituting the clusters is represented as a vector in a high dimensional space spanned by all sub-phonetic segments with given length. Then a kernel principal component analysis reduces the dimensionality by grouping co-occurred sub-phonetic segments in each topic. This yields that cosine similarity between vectors is related with topical similarity, and the hierarchical clustering method using the similarity measure is expected to form topically homogeneous clusters. In fact, effectiveness of the method is shown in an experiment on topic segmentation of broadcast news.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) principal component analysis / kernel methods / topic segmentation / clustering / speech recognition
Paper # AI2004-77
Date of Issue

Conference Information
Committee AI
Conference Date 2005/3/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Artificial Intelligence and Knowledge-Based Processing (AI)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Topic Segmentation Using Kernel Pricipal Component Analysis for Sub-Phonetic Segments
Sub Title (in English)
Keyword(1) principal component analysis
Keyword(2) kernel methods
Keyword(3) topic segmentation
Keyword(4) clustering
Keyword(5) speech recognition
1st Author's Name Ken SADOHARA
1st Author's Affiliation National Institute of Advanced Industrial Science and Technology (AIST) AIST Tsukuba Central 2()
2nd Author's Name Shi-wook LEE
2nd Author's Affiliation National Institute of Advanced Industrial Science and Technology (AIST) AIST Tsukuba Central 2
3rd Author's Name Hiroaki KOJIMA
3rd Author's Affiliation National Institute of Advanced Industrial Science and Technology (AIST) AIST Tsukuba Central 2
Date 2005/3/7
Paper # AI2004-77
Volume (vol) vol.104
Number (no) 726
Page pp.pp.-
#Pages 5
Date of Issue