制約付きThree-Way Restricted Boltzmann Machineを用いた音響・音韻・話者情報の同時モデリング

中鹿 亘; 滝口 哲也

Presentation	2015-12-02 Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine Toru Nakashika, Tetsuya Takiguchi,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we argue the way of modelling speech signals using improved three-way restricted Boltzmann machine (3WRBM) where acoustic features, latent phonological features, and speaker-identity features are considered. The 3WRBM is an energy-based probabilistic model that includes three kinds of potentials: unary potentials of each variable, pairwise potentials of every two variables, and three-way potentials of the three variables. In our approach, we design the three-way potentials properly in the speaker-adaptive training (SAT) manner. The optimized model captures the relationships between the variables, enables to compute conditional probabilities of each variables, and is appliable to many tasks in speech signal processing. For example, estimating speaker-identity features given acoustic features is used for speaker recognition. Another example is estimating acoustic features from the phonological features that are estimated given source speaker's acoustic features and the desired speaker-identity features; that is voice conversion. In our experiments, we evaluate the effectiveness of the speech modelling through a voice conversion task and a speaker recognition task.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	speech modelling / restricted Boltzmann machine / speaker-adaptive training / speaker recognition / voice conversion
Paper #	SP2015-71
Date of Issue	2015-11-25 (SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2015/12/2(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Nagoya Inst of Tech.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium
Chair	Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大)
Vice Chair	Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.)
Secretary	Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.)
Assistant	Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Simultaneous Modelling of Acoustic, Phonetic, Speaker Features Using Improved Three-Way Restricted Boltzmann Machine
Sub Title (in English)
Keyword(1)	speech modelling
Keyword(2)	restricted Boltzmann machine
Keyword(3)	speaker-adaptive training
Keyword(4)	speaker recognition
Keyword(5)	voice conversion
1st Author's Name	Toru Nakashika
1st Author's Affiliation	The University of Electro-Communications(UEC)
2nd Author's Name	Tetsuya Takiguchi
2nd Author's Affiliation	Kobe University(Kobe Univ.)
Date	2015-12-02
Paper #	SP2015-71
Volume (vol)	vol.115
Number (no)	SP-346
Page	pp.pp.7-12(SP),
#Pages	6
Date of Issue	2015-11-25 (SP)