Presentation | 2018-09-28 Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm Dung Kim Tran, Huy Quoc Nguyen, Masashi Unoki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Recent studies have revealed the weakness of audio fingerprinting methods in speech signals. The problem is that spectrograms, which are used by conventional audio fingerprinting techniques, are not suitable for representing speech signals in the process of creating speech fingerprint. Instead, spikegrams are a preferable model because of their adaptability to speech. This paper evaluates different kinds of techniques that can be used to create spikegrams. The resulting spikegrams are compared in terms of sparsity and signal resynthesis quality. Furthermore, the abilities of the spikegrams in conveying speaker individuality and linguistic features are evaluated by utilizing a convolutional neural network in terms of recognition accuracy. Experiment results show that spikegrams created by using an algorithm of perceptual matching pursuit and Gammachirp base spectra are the most suitable model for representing speech signals in the process of creating speech fingerprint. In the scope of this paper, this kind of spikegrams has the lowest spike rate, highest identification accuracy, and comparable PESQ score. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network |
Paper # | LOIS2018-20,IE2018-40,EMM2018-59 |
Date of Issue | 2018-09-20 (LOIS, IE, EMM) |
Conference Information | |
Committee | IEE-CMN / EMM / LOIS / IE / ITE-ME |
---|---|
Conference Date | 2018/9/27(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Beppu Int'l Convention Ctr. aka B-CON Plaza |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Multimedia Communication/System, Lifelog Applications, IP Broadcasting/Video Transmission, Media Security, Media Processing (AI, Deep Learning), etc. |
Chair | Shun Morimura(CRIEPI) / Keiichi Iwamura(TUC) / Tomohiro Yamada(NTT) / Takayuki Hamamoto(Tokyo Univ. of Science) / Miki Haseyama(北大) |
Vice Chair | / Minoru Kuribayashi(Okayama Univ.) / Tetsuya Kojima(NIT,Tokyo College) / Toru Kobayashi(Nagasaki Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(NII) / Norio Tagawa(Tokyo Metropolitan Univ.) |
Secretary | (Tokai Univ.) / Minoru Kuribayashi(Kansai Univ.) / Tetsuya Kojima(NIT, Tokyo) / Toru Kobayashi(Chukyo Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(Research Organization of Information and Systems) / Norio Tagawa(KDDI Research) |
Assistant | Tomotaka Kimura(Doshisha Univ.) / 田中 彰浩(CRIEPI) / Hiroko Akiyama(NIT, Nagano College) / Kitahiro Kaneda(CANON) / Shinichiro Eitoku(NTT) / Kazuya Hayase(NTT) / Yasutaka Matsuo(NHK) |
Paper Information | |
Registration To | Technical Meeting on Communications / Technical Committee on Enriched MultiMedia / Technical Committee on Life Intelligence and Office Information Systems / Technical Committee on Image Engineering / Technical Group on Media Engineering |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm |
Sub Title (in English) | |
Keyword(1) | Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network |
1st Author's Name | Dung Kim Tran |
1st Author's Affiliation | Japan Advanced Institute of Science and Technology(JAIST) |
2nd Author's Name | Huy Quoc Nguyen |
2nd Author's Affiliation | Japan Advanced Institute of Science and Technology(JAIST) |
3rd Author's Name | Masashi Unoki |
3rd Author's Affiliation | Japan Advanced Institute of Science and Technology(JAIST) |
Date | 2018-09-28 |
Paper # | LOIS2018-20,IE2018-40,EMM2018-59 |
Volume (vol) | vol.118 |
Number (no) | LOIS-222,IE-223,EMM-224 |
Page | pp.pp.71-76(LOIS), pp.71-76(IE), pp.71-76(EMM), |
#Pages | 6 |
Date of Issue | 2018-09-20 (LOIS, IE, EMM) |