テンソル分解に基づく音声表現とその言語識別・話者識別への応用

Presentation	2016-03-29 Tensor-based Speech Representation and its Application to Identification of Languages and Speakers So Suzuki, Daisuke Saito, Nobuaki Minematsu,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper proposes a novel approach to speech representation for automatic identification of languages and speakers by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector is derived by modeling each utterance as a GMM and projecting its supervector (GMM-SV) onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By projecting the tensor to a lower dimensional space, we obtain a new representation for an input utterance. We apply this method to two tasks of automatic identification of languages and speakers and evaluate its effectiveness.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	language identification / speaker identification / Gaussian mixture model / GMM supervector / i-vector / tensor analysis / Tucker decomposition
Paper #	EA2015-127,SIP2015-176,SP2015-155
Date of Issue	2016-03-21 (EA, SIP, SP)

Conference Information
Committee	EA / SP / SIP
Conference Date	2016/3/28(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Beppu International Convention Center B-ConPlaza
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Engineering/Electro Acoustics, Speech, Signal Processing, and Related Topics
Chair	Yoichi Haneda(Univ. of Electro-Comm.) / Kazunori Mano(Shibaura Inst. of Tech.) / Osamu Houshuyama(NEC)
Vice Chair	Yukio Iwaya(Tohoku Gakuin Univ.) / Mitsunori Mizumachi(Kyushu Inst. of Tech.) / Norihide Kitaoka(Tokushima Univ.) / Makoto Nakashizuka(Chiba Inst. of Tech.) / Masahiro Okuda(Univ. of Kitakyushu)
Secretary	Yukio Iwaya(NTT) / Mitsunori Mizumachi(KDDI R&D Labs.) / Norihide Kitaoka(Tokyo City Univ.) / Makoto Nakashizuka(Kobe Univ.) / Masahiro Okuda(NEC)
Assistant	Shoichi Koyama(Univ. of Tokyo) / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) / Takamichi Miyata(Chiba Inst. of Tech.)

Paper Information
Registration To	Technical Committee on Engineering Acoustics / Technical Committee on Speech / Technical Committee on Signal Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Tensor-based Speech Representation and its Application to Identification of Languages and Speakers
Sub Title (in English)
Keyword(1)	language identification
Keyword(2)	speaker identification
Keyword(3)	Gaussian mixture model
Keyword(4)	GMM supervector
Keyword(5)	i-vector
Keyword(6)	tensor analysis
Keyword(7)	Tucker decomposition
1st Author's Name	So Suzuki
1st Author's Affiliation	The University of Tokyo(UTokyo)
2nd Author's Name	Daisuke Saito
2nd Author's Affiliation	The University of Tokyo(UTokyo)
3rd Author's Name	Nobuaki Minematsu
3rd Author's Affiliation	The University of Tokyo(UTokyo)
Date	2016-03-29
Paper #	EA2015-127,SIP2015-176,SP2015-155
Volume (vol)	vol.115
Number (no)	EA-521,SIP-522,SP-523
Page	pp.pp.341-346(EA), pp.341-346(SIP), pp.341-346(SP),
#Pages	6
Date of Issue	2016-03-21 (EA, SIP, SP)