Presentation | 2022-01-27 Auditory Representation of Speech Signals Using a Matching Pursuit Algorithm and Sparse Coding Dung Kim Tran, Masashi Unoki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Speech signals are the natural carrier of information such as linguistic, speaker individuality, and emotions, etc. Therefore, utilizing perceptual features of speech signals would be beneficial to speech analysis applications. Current solutions combine the Bark scale and a gammatone basis with a matching pursuit algorithm to obtain perceptual features. This paper proposes to use more physiological accurate techniques such as equivalent rectangular bandwidth, a gammachirp basis, and auditory masking effects of gammachirp kernels. Experimental results show that the perceptual features produced by the proposed method can achieve 0.89 PEMO-Q and 3.27 PESQ scores using only 1066 coefficients per second. Furthermore, the proposed method also provides the highest matching accuracy in a pattern matching experiment. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Auditory filterbankequivalent rectangular bandwidthgammatone/gammachirpmasking effectmatching pursuitperceptual featuressparse codingspikegram |
Paper # | EMM2021-87 |
Date of Issue | 2022-01-20 (EMM) |
Conference Information | |
Committee | EMM |
---|---|
Conference Date | 2022/1/27(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Sense of Presence, Universal Media, Digital Entertainment, etc. |
Chair | Ryoichi Nishimura(NICT) |
Vice Chair | Masaaki Fujiyoshi(Tokyo Metropolitan Univ.) / Masatsugu Ichino(Univ. of Electro-Comm.) |
Secretary | Masaaki Fujiyoshi(Utsunomiya Univ.) / Masatsugu Ichino(NICT) |
Assistant | Shoko Imaizumi(Chiba Univ.) / Youichi Takashima(Kaishi Professional Univ.) |
Paper Information | |
Registration To | Technical Committee on Enriched MultiMedia |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Auditory Representation of Speech Signals Using a Matching Pursuit Algorithm and Sparse Coding |
Sub Title (in English) | |
Keyword(1) | Auditory filterbankequivalent rectangular bandwidthgammatone/gammachirpmasking effectmatching pursuitperceptual featuressparse codingspikegram |
1st Author's Name | Dung Kim Tran |
1st Author's Affiliation | Japan Advanced Institute of Science and Technology(JAIST) |
2nd Author's Name | Masashi Unoki |
2nd Author's Affiliation | Japan Advanced Institute of Science and Technology(JAIST) |
Date | 2022-01-27 |
Paper # | EMM2021-87 |
Volume (vol) | vol.121 |
Number (no) | EMM-362 |
Page | pp.pp.19-24(EMM), |
#Pages | 6 |
Date of Issue | 2022-01-20 (EMM) |