Presentation 2018-09-28
Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm
Dung Kim Tran, Huy Quoc Nguyen, Masashi Unoki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recent studies have revealed the weakness of audio fingerprinting methods in speech signals. The problem is that spectrograms, which are used by conventional audio fingerprinting techniques, are not suitable for representing speech signals in the process of creating speech fingerprint. Instead, spikegrams are a preferable model because of their adaptability to speech. This paper evaluates different kinds of techniques that can be used to create spikegrams. The resulting spikegrams are compared in terms of sparsity and signal resynthesis quality. Furthermore, the abilities of the spikegrams in conveying speaker individuality and linguistic features are evaluated by utilizing a convolutional neural network in terms of recognition accuracy. Experiment results show that spikegrams created by using an algorithm of perceptual matching pursuit and Gammachirp base spectra are the most suitable model for representing speech signals in the process of creating speech fingerprint. In the scope of this paper, this kind of spikegrams has the lowest spike rate, highest identification accuracy, and comparable PESQ score.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network
Paper # LOIS2018-20,IE2018-40,EMM2018-59
Date of Issue 2018-09-20 (LOIS, IE, EMM)

Conference Information
Committee IEE-CMN / EMM / LOIS / IE / ITE-ME
Conference Date 2018/9/27(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Beppu Int'l Convention Ctr. aka B-CON Plaza
Topics (in Japanese) (See Japanese page)
Topics (in English) Multimedia Communication/System, Lifelog Applications, IP Broadcasting/Video Transmission, Media Security, Media Processing (AI, Deep Learning), etc.
Chair Shun Morimura(CRIEPI) / Keiichi Iwamura(TUC) / Tomohiro Yamada(NTT) / Takayuki Hamamoto(Tokyo Univ. of Science) / Miki Haseyama(北大)
Vice Chair / Minoru Kuribayashi(Okayama Univ.) / Tetsuya Kojima(NIT,Tokyo College) / Toru Kobayashi(Nagasaki Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(NII) / Norio Tagawa(Tokyo Metropolitan Univ.)
Secretary (Tokai Univ.) / Minoru Kuribayashi(Kansai Univ.) / Tetsuya Kojima(NIT, Tokyo) / Toru Kobayashi(Chukyo Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(Research Organization of Information and Systems) / Norio Tagawa(KDDI Research)
Assistant Tomotaka Kimura(Doshisha Univ.) / 田中 彰浩(CRIEPI) / Hiroko Akiyama(NIT, Nagano College) / Kitahiro Kaneda(CANON) / Shinichiro Eitoku(NTT) / Kazuya Hayase(NTT) / Yasutaka Matsuo(NHK)

Paper Information
Registration To Technical Meeting on Communications / Technical Committee on Enriched MultiMedia / Technical Committee on Life Intelligence and Office Information Systems / Technical Committee on Image Engineering / Technical Group on Media Engineering
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm
Sub Title (in English)
Keyword(1) Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network
1st Author's Name Dung Kim Tran
1st Author's Affiliation Japan Advanced Institute of Science and Technology(JAIST)
2nd Author's Name Huy Quoc Nguyen
2nd Author's Affiliation Japan Advanced Institute of Science and Technology(JAIST)
3rd Author's Name Masashi Unoki
3rd Author's Affiliation Japan Advanced Institute of Science and Technology(JAIST)
Date 2018-09-28
Paper # LOIS2018-20,IE2018-40,EMM2018-59
Volume (vol) vol.118
Number (no) LOIS-222,IE-223,EMM-224
Page pp.pp.71-76(LOIS), pp.71-76(IE), pp.71-76(EMM),
#Pages 6
Date of Issue 2018-09-20 (LOIS, IE, EMM)