Presentation 2022-01-27
Auditory Representation of Speech Signals Using a Matching Pursuit Algorithm and Sparse Coding
Dung Kim Tran, Masashi Unoki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Speech signals are the natural carrier of information such as linguistic, speaker individuality, and emotions, etc. Therefore, utilizing perceptual features of speech signals would be beneficial to speech analysis applications. Current solutions combine the Bark scale and a gammatone basis with a matching pursuit algorithm to obtain perceptual features. This paper proposes to use more physiological accurate techniques such as equivalent rectangular bandwidth, a gammachirp basis, and auditory masking effects of gammachirp kernels. Experimental results show that the perceptual features produced by the proposed method can achieve 0.89 PEMO-Q and 3.27 PESQ scores using only 1066 coefficients per second. Furthermore, the proposed method also provides the highest matching accuracy in a pattern matching experiment.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Auditory filterbankequivalent rectangular bandwidthgammatone/gammachirpmasking effectmatching pursuitperceptual featuressparse codingspikegram
Paper # EMM2021-87
Date of Issue 2022-01-20 (EMM)

Conference Information
Committee EMM
Conference Date 2022/1/27(1days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Sense of Presence, Universal Media, Digital Entertainment, etc.
Chair Ryoichi Nishimura(NICT)
Vice Chair Masaaki Fujiyoshi(Tokyo Metropolitan Univ.) / Masatsugu Ichino(Univ. of Electro-Comm.)
Secretary Masaaki Fujiyoshi(Utsunomiya Univ.) / Masatsugu Ichino(NICT)
Assistant Shoko Imaizumi(Chiba Univ.) / Youichi Takashima(Kaishi Professional Univ.)

Paper Information
Registration To Technical Committee on Enriched MultiMedia
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Auditory Representation of Speech Signals Using a Matching Pursuit Algorithm and Sparse Coding
Sub Title (in English)
Keyword(1) Auditory filterbankequivalent rectangular bandwidthgammatone/gammachirpmasking effectmatching pursuitperceptual featuressparse codingspikegram
1st Author's Name Dung Kim Tran
1st Author's Affiliation Japan Advanced Institute of Science and Technology(JAIST)
2nd Author's Name Masashi Unoki
2nd Author's Affiliation Japan Advanced Institute of Science and Technology(JAIST)
Date 2022-01-27
Paper # EMM2021-87
Volume (vol) vol.121
Number (no) EMM-362
Page pp.pp.19-24(EMM),
#Pages 6
Date of Issue 2022-01-20 (EMM)