Presentation | 2016-03-29 Tensor-based Speech Representation and its Application to Identification of Languages and Speakers So Suzuki, Daisuke Saito, Nobuaki Minematsu, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a novel approach to speech representation for automatic identification of languages and speakers by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector is derived by modeling each utterance as a GMM and projecting its supervector (GMM-SV) onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By projecting the tensor to a lower dimensional space, we obtain a new representation for an input utterance. We apply this method to two tasks of automatic identification of languages and speakers and evaluate its effectiveness. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | language identification / speaker identification / Gaussian mixture model / GMM supervector / i-vector / tensor analysis / Tucker decomposition |
Paper # | EA2015-127,SIP2015-176,SP2015-155 |
Date of Issue | 2016-03-21 (EA, SIP, SP) |
Conference Information | |
Committee | EA / SP / SIP |
---|---|
Conference Date | 2016/3/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Beppu International Convention Center B-ConPlaza |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Engineering/Electro Acoustics, Speech, Signal Processing, and Related Topics |
Chair | Yoichi Haneda(Univ. of Electro-Comm.) / Kazunori Mano(Shibaura Inst. of Tech.) / Osamu Houshuyama(NEC) |
Vice Chair | Yukio Iwaya(Tohoku Gakuin Univ.) / Mitsunori Mizumachi(Kyushu Inst. of Tech.) / Norihide Kitaoka(Tokushima Univ.) / Makoto Nakashizuka(Chiba Inst. of Tech.) / Masahiro Okuda(Univ. of Kitakyushu) |
Secretary | Yukio Iwaya(NTT) / Mitsunori Mizumachi(KDDI R&D Labs.) / Norihide Kitaoka(Tokyo City Univ.) / Makoto Nakashizuka(Kobe Univ.) / Masahiro Okuda(NEC) |
Assistant | Shoichi Koyama(Univ. of Tokyo) / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) / Takamichi Miyata(Chiba Inst. of Tech.) |
Paper Information | |
Registration To | Technical Committee on Engineering Acoustics / Technical Committee on Speech / Technical Committee on Signal Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Tensor-based Speech Representation and its Application to Identification of Languages and Speakers |
Sub Title (in English) | |
Keyword(1) | language identification |
Keyword(2) | speaker identification |
Keyword(3) | Gaussian mixture model |
Keyword(4) | GMM supervector |
Keyword(5) | i-vector |
Keyword(6) | tensor analysis |
Keyword(7) | Tucker decomposition |
1st Author's Name | So Suzuki |
1st Author's Affiliation | The University of Tokyo(UTokyo) |
2nd Author's Name | Daisuke Saito |
2nd Author's Affiliation | The University of Tokyo(UTokyo) |
3rd Author's Name | Nobuaki Minematsu |
3rd Author's Affiliation | The University of Tokyo(UTokyo) |
Date | 2016-03-29 |
Paper # | EA2015-127,SIP2015-176,SP2015-155 |
Volume (vol) | vol.115 |
Number (no) | EA-521,SIP-522,SP-523 |
Page | pp.pp.341-346(EA), pp.341-346(SIP), pp.341-346(SP), |
#Pages | 6 |
Date of Issue | 2016-03-21 (EA, SIP, SP) |