Presentation 2018-07-26
Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation
Hiroshi Sekiguchi, Yoshiaki Narusue, Hiroyuki Morikawa,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper introduces ladder network implementation induced by auditory computational model for multi-talker speech separation. The conventional approach of learning mask ratio of spectrum has been intensively investigated. However, compared with auditory system, it remains limiting in performance in such a way that reconstructed speech shows signal-to distortion ratio (SDR) around 10dB at best. To improve SDR performance, we are inspired by auditory neuroscience, which says speech separation consists of two functions, auditory speech feature extraction and temporal synchronization detection and clustering. The first analyzes speech features and the latter extracts features varying synchronized with the low-frequency-below-5Hz movement of mouth, which are grouped as one speaker, whereas unsynchronized movement grouped as different one. We consider the importance of affinity constraint between these two functions. We derive two different computational models from two functions with this constraint. Then ladder network implements these two computational models with suitable network structures to proper reconstruction path.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) speech separation / temporal coherence / auditory neuroscience / ladder network
Paper # SP2018-18
Date of Issue 2018-07-19 (SP)

Conference Information
Committee SP / IPSJ-SLP
Conference Date 2018/7/26(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Sago-Royal-Hotel (Hamamatsu)
Topics (in Japanese) (See Japanese page)
Topics (in English) Speech recognition and understanding, dialog system, etc.
Chair Yoichi Yamashita(Ritsumeikan Univ.) / Masafumi Nishimura(Shizuoka Univ.)
Vice Chair Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Akinobu Ri(Kyoto Univ.) / (Meijo Univ.)
Assistant Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation
Sub Title (in English)
Keyword(1) speech separation
Keyword(2) temporal coherence
Keyword(3) auditory neuroscience
Keyword(4) ladder network
1st Author's Name Hiroshi Sekiguchi
1st Author's Affiliation The University of Tokyo(Univ. of Tokyo)
2nd Author's Name Yoshiaki Narusue
2nd Author's Affiliation The University of Tokyo(Univ. of Tokyo)
3rd Author's Name Hiroyuki Morikawa
3rd Author's Affiliation The University of Tokyo(Univ. of Tokyo)
Date 2018-07-26
Paper # SP2018-18
Volume (vol) vol.118
Number (no) SP-160
Page pp.pp.9-13(SP),
#Pages 5
Date of Issue 2018-07-19 (SP)