Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm

チャン キム ズン; グエン フイ コック; 鵜木 祐史

Presentation	2018-09-28 Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm Dung Kim Tran, Huy Quoc Nguyen, Masashi Unoki,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Recent studies have revealed the weakness of audio fingerprinting methods in speech signals. The problem is that spectrograms, which are used by conventional audio fingerprinting techniques, are not suitable for representing speech signals in the process of creating speech fingerprint. Instead, spikegrams are a preferable model because of their adaptability to speech. This paper evaluates different kinds of techniques that can be used to create spikegrams. The resulting spikegrams are compared in terms of sparsity and signal resynthesis quality. Furthermore, the abilities of the spikegrams in conveying speaker individuality and linguistic features are evaluated by utilizing a convolutional neural network in terms of recognition accuracy. Experiment results show that spikegrams created by using an algorithm of perceptual matching pursuit and Gammachirp base spectra are the most suitable model for representing speech signals in the process of creating speech fingerprint. In the scope of this paper, this kind of spikegrams has the lowest spike rate, highest identification accuracy, and comparable PESQ score.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network
Paper #	LOIS2018-20,IE2018-40,EMM2018-59
Date of Issue	2018-09-20 (LOIS, IE, EMM)

Conference Information
Committee	IEE-CMN / EMM / LOIS / IE / ITE-ME
Conference Date	2018/9/27(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Beppu Int'l Convention Ctr. aka B-CON Plaza
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Multimedia Communication/System, Lifelog Applications, IP Broadcasting/Video Transmission, Media Security, Media Processing (AI, Deep Learning), etc.
Chair	Shun Morimura(CRIEPI) / Keiichi Iwamura(TUC) / Tomohiro Yamada(NTT) / Takayuki Hamamoto(Tokyo Univ. of Science) / Miki Haseyama(北大)
Vice Chair	/ Minoru Kuribayashi(Okayama Univ.) / Tetsuya Kojima(NIT,Tokyo College) / Toru Kobayashi(Nagasaki Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(NII) / Norio Tagawa(Tokyo Metropolitan Univ.)
Secretary	(Tokai Univ.) / Minoru Kuribayashi(Kansai Univ.) / Tetsuya Kojima(NIT, Tokyo) / Toru Kobayashi(Chukyo Univ.) / Hideaki Kimata(NTT) / Kazuya Kodama(Research Organization of Information and Systems) / Norio Tagawa(KDDI Research)
Assistant	Tomotaka Kimura(Doshisha Univ.) / 田中彰浩(CRIEPI) / Hiroko Akiyama(NIT, Nagano College) / Kitahiro Kaneda(CANON) / Shinichiro Eitoku(NTT) / Kazuya Hayase(NTT) / Yasutaka Matsuo(NHK)

Paper Information
Registration To	Technical Meeting on Communications / Technical Committee on Enriched MultiMedia / Technical Committee on Life Intelligence and Office Information Systems / Technical Committee on Image Engineering / Technical Group on Media Engineering
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Study on speech representation for speech fingerprint using perceptual matching-pursuit algorithm
Sub Title (in English)
Keyword(1)	Speech fingerprintspikegramperceptual matching-pursuitGammatone kernelGammachirp kernelconvolutional neural network
1st Author's Name	Dung Kim Tran
1st Author's Affiliation	Japan Advanced Institute of Science and Technology(JAIST)
2nd Author's Name	Huy Quoc Nguyen
2nd Author's Affiliation	Japan Advanced Institute of Science and Technology(JAIST)
3rd Author's Name	Masashi Unoki
3rd Author's Affiliation	Japan Advanced Institute of Science and Technology(JAIST)
Date	2018-09-28
Paper #	LOIS2018-20,IE2018-40,EMM2018-59
Volume (vol)	vol.118
Number (no)	LOIS-222,IE-223,EMM-224
Page	pp.pp.71-76(LOIS), pp.71-76(IE), pp.71-76(EMM),
#Pages	6
Date of Issue	2018-09-20 (LOIS, IE, EMM)