Presentation 2016-03-29
Tensor-based Speech Representation and its Application to Identification of Languages and Speakers
So Suzuki, Daisuke Saito, Nobuaki Minematsu,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a novel approach to speech representation for automatic identification of languages and speakers by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector is derived by modeling each utterance as a GMM and projecting its supervector (GMM-SV) onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By projecting the tensor to a lower dimensional space, we obtain a new representation for an input utterance. We apply this method to two tasks of automatic identification of languages and speakers and evaluate its effectiveness.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) language identification / speaker identification / Gaussian mixture model / GMM supervector / i-vector / tensor analysis / Tucker decomposition
Paper # EA2015-127,SIP2015-176,SP2015-155
Date of Issue 2016-03-21 (EA, SIP, SP)

Conference Information
Committee EA / SP / SIP
Conference Date 2016/3/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Beppu International Convention Center B-ConPlaza
Topics (in Japanese) (See Japanese page)
Topics (in English) Engineering/Electro Acoustics, Speech, Signal Processing, and Related Topics
Chair Yoichi Haneda(Univ. of Electro-Comm.) / Kazunori Mano(Shibaura Inst. of Tech.) / Osamu Houshuyama(NEC)
Vice Chair Yukio Iwaya(Tohoku Gakuin Univ.) / Mitsunori Mizumachi(Kyushu Inst. of Tech.) / Norihide Kitaoka(Tokushima Univ.) / Makoto Nakashizuka(Chiba Inst. of Tech.) / Masahiro Okuda(Univ. of Kitakyushu)
Secretary Yukio Iwaya(NTT) / Mitsunori Mizumachi(KDDI R&D Labs.) / Norihide Kitaoka(Tokyo City Univ.) / Makoto Nakashizuka(Kobe Univ.) / Masahiro Okuda(NEC)
Assistant Shoichi Koyama(Univ. of Tokyo) / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) / Takamichi Miyata(Chiba Inst. of Tech.)

Paper Information
Registration To Technical Committee on Engineering Acoustics / Technical Committee on Speech / Technical Committee on Signal Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Tensor-based Speech Representation and its Application to Identification of Languages and Speakers
Sub Title (in English)
Keyword(1) language identification
Keyword(2) speaker identification
Keyword(3) Gaussian mixture model
Keyword(4) GMM supervector
Keyword(5) i-vector
Keyword(6) tensor analysis
Keyword(7) Tucker decomposition
1st Author's Name So Suzuki
1st Author's Affiliation The University of Tokyo(UTokyo)
2nd Author's Name Daisuke Saito
2nd Author's Affiliation The University of Tokyo(UTokyo)
3rd Author's Name Nobuaki Minematsu
3rd Author's Affiliation The University of Tokyo(UTokyo)
Date 2016-03-29
Paper # EA2015-127,SIP2015-176,SP2015-155
Volume (vol) vol.115
Number (no) EA-521,SIP-522,SP-523
Page pp.pp.341-346(EA), pp.341-346(SIP), pp.341-346(SP),
#Pages 6
Date of Issue 2016-03-21 (EA, SIP, SP)