Presentation | 2015-12-02 Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine Toru Nakashika, Tetsuya Takiguchi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we argue the way of modelling speech signals using improved three-way restricted Boltzmann machine (3WRBM) where acoustic features, latent phonological features, and speaker-identity features are considered. The 3WRBM is an energy-based probabilistic model that includes three kinds of potentials: unary potentials of each variable, pairwise potentials of every two variables, and three-way potentials of the three variables. In our approach, we design the three-way potentials properly in the speaker-adaptive training (SAT) manner. The optimized model captures the relationships between the variables, enables to compute conditional probabilities of each variables, and is appliable to many tasks in speech signal processing. For example, estimating speaker-identity features given acoustic features is used for speaker recognition. Another example is estimating acoustic features from the phonological features that are estimated given source speaker's acoustic features and the desired speaker-identity features; that is voice conversion. In our experiments, we evaluate the effectiveness of the speech modelling through a voice conversion task and a speaker recognition task. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | speech modelling / restricted Boltzmann machine / speaker-adaptive training / speaker recognition / voice conversion |
Paper # | SP2015-71 |
Date of Issue | 2015-11-25 (SP) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2015/12/2(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Nagoya Inst of Tech. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium |
Chair | Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大) |
Vice Chair | Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.) |
Secretary | Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.) |
Assistant | Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine |
Sub Title (in English) | |
Keyword(1) | speech modelling |
Keyword(2) | restricted Boltzmann machine |
Keyword(3) | speaker-adaptive training |
Keyword(4) | speaker recognition |
Keyword(5) | voice conversion |
1st Author's Name | Toru Nakashika |
1st Author's Affiliation | The University of Electro-Communications(UEC) |
2nd Author's Name | Tetsuya Takiguchi |
2nd Author's Affiliation | Kobe University(Kobe Univ.) |
Date | 2015-12-02 |
Paper # | SP2015-71 |
Volume (vol) | vol.115 |
Number (no) | SP-346 |
Page | pp.pp.7-12(SP), |
#Pages | 6 |
Date of Issue | 2015-11-25 (SP) |