Presentation 2015-12-02
Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine
Toru Nakashika, Tetsuya Takiguchi,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we argue the way of modelling speech signals using improved three-way restricted Boltzmann machine (3WRBM) where acoustic features, latent phonological features, and speaker-identity features are considered. The 3WRBM is an energy-based probabilistic model that includes three kinds of potentials: unary potentials of each variable, pairwise potentials of every two variables, and three-way potentials of the three variables. In our approach, we design the three-way potentials properly in the speaker-adaptive training (SAT) manner. The optimized model captures the relationships between the variables, enables to compute conditional probabilities of each variables, and is appliable to many tasks in speech signal processing. For example, estimating speaker-identity features given acoustic features is used for speaker recognition. Another example is estimating acoustic features from the phonological features that are estimated given source speaker's acoustic features and the desired speaker-identity features; that is voice conversion. In our experiments, we evaluate the effectiveness of the speech modelling through a voice conversion task and a speaker recognition task.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) speech modelling / restricted Boltzmann machine / speaker-adaptive training / speaker recognition / voice conversion
Paper # SP2015-71
Date of Issue 2015-11-25 (SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2015/12/2(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Nagoya Inst of Tech.
Topics (in Japanese) (See Japanese page)
Topics (in English) The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium
Chair Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大)
Vice Chair Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.)
Secretary Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.)
Assistant Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine
Sub Title (in English)
Keyword(1) speech modelling
Keyword(2) restricted Boltzmann machine
Keyword(3) speaker-adaptive training
Keyword(4) speaker recognition
Keyword(5) voice conversion
1st Author's Name Toru Nakashika
1st Author's Affiliation The University of Electro-Communications(UEC)
2nd Author's Name Tetsuya Takiguchi
2nd Author's Affiliation Kobe University(Kobe Univ.)
Date 2015-12-02
Paper # SP2015-71
Volume (vol) vol.115
Number (no) SP-346
Page pp.pp.7-12(SP),
#Pages 6
Date of Issue 2015-11-25 (SP)