複数音声の分離再構成に向けた聴覚数理モデル導出型ラダーネットワーク

関口 浩; 成末 義哲; 森川 博之

Presentation	2018-07-26 Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation Hiroshi Sekiguchi, Yoshiaki Narusue, Hiroyuki Morikawa,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper introduces ladder network implementation induced by auditory computational model for multi-talker speech separation. The conventional approach of learning mask ratio of spectrum has been intensively investigated. However, compared with auditory system, it remains limiting in performance in such a way that reconstructed speech shows signal-to distortion ratio (SDR) around 10dB at best. To improve SDR performance, we are inspired by auditory neuroscience, which says speech separation consists of two functions, auditory speech feature extraction and temporal synchronization detection and clustering. The first analyzes speech features and the latter extracts features varying synchronized with the low-frequency-below-5Hz movement of mouth, which are grouped as one speaker, whereas unsynchronized movement grouped as different one. We consider the importance of affinity constraint between these two functions. We derive two different computational models from two functions with this constraint. Then ladder network implements these two computational models with suitable network structures to proper reconstruction path.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	speech separation / temporal coherence / auditory neuroscience / ladder network
Paper #	SP2018-18
Date of Issue	2018-07-19 (SP)

Conference Information
Committee	SP / IPSJ-SLP
Conference Date	2018/7/26(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Sago-Royal-Hotel (Hamamatsu)
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Speech recognition and understanding, dialog system, etc.
Chair	Yoichi Yamashita(Ritsumeikan Univ.) / Masafumi Nishimura(Shizuoka Univ.)
Vice Chair	Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Akinobu Ri(Kyoto Univ.) / (Meijo Univ.)
Assistant	Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To	Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation
Sub Title (in English)
Keyword(1)	speech separation
Keyword(2)	temporal coherence
Keyword(3)	auditory neuroscience
Keyword(4)	ladder network
1st Author's Name	Hiroshi Sekiguchi
1st Author's Affiliation	The University of Tokyo(Univ. of Tokyo)
2nd Author's Name	Yoshiaki Narusue
2nd Author's Affiliation	The University of Tokyo(Univ. of Tokyo)
3rd Author's Name	Hiroyuki Morikawa
3rd Author's Affiliation	The University of Tokyo(Univ. of Tokyo)
Date	2018-07-26
Paper #	SP2018-18
Volume (vol)	vol.118
Number (no)	SP-160
Page	pp.pp.9-13(SP),
#Pages	5
Date of Issue	2018-07-19 (SP)