Presentation | 2018-07-26 Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation Hiroshi Sekiguchi, Yoshiaki Narusue, Hiroyuki Morikawa, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper introduces ladder network implementation induced by auditory computational model for multi-talker speech separation. The conventional approach of learning mask ratio of spectrum has been intensively investigated. However, compared with auditory system, it remains limiting in performance in such a way that reconstructed speech shows signal-to distortion ratio (SDR) around 10dB at best. To improve SDR performance, we are inspired by auditory neuroscience, which says speech separation consists of two functions, auditory speech feature extraction and temporal synchronization detection and clustering. The first analyzes speech features and the latter extracts features varying synchronized with the low-frequency-below-5Hz movement of mouth, which are grouped as one speaker, whereas unsynchronized movement grouped as different one. We consider the importance of affinity constraint between these two functions. We derive two different computational models from two functions with this constraint. Then ladder network implements these two computational models with suitable network structures to proper reconstruction path. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | speech separation / temporal coherence / auditory neuroscience / ladder network |
Paper # | SP2018-18 |
Date of Issue | 2018-07-19 (SP) |
Conference Information | |
Committee | SP / IPSJ-SLP |
---|---|
Conference Date | 2018/7/26(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Sago-Royal-Hotel (Hamamatsu) |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech recognition and understanding, dialog system, etc. |
Chair | Yoichi Yamashita(Ritsumeikan Univ.) / Masafumi Nishimura(Shizuoka Univ.) |
Vice Chair | Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Akinobu Ri(Kyoto Univ.) / (Meijo Univ.) |
Assistant | Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Ladder Network Driven from Auditory Computational Model for Multi-talker Speech Separation |
Sub Title (in English) | |
Keyword(1) | speech separation |
Keyword(2) | temporal coherence |
Keyword(3) | auditory neuroscience |
Keyword(4) | ladder network |
1st Author's Name | Hiroshi Sekiguchi |
1st Author's Affiliation | The University of Tokyo(Univ. of Tokyo) |
2nd Author's Name | Yoshiaki Narusue |
2nd Author's Affiliation | The University of Tokyo(Univ. of Tokyo) |
3rd Author's Name | Hiroyuki Morikawa |
3rd Author's Affiliation | The University of Tokyo(Univ. of Tokyo) |
Date | 2018-07-26 |
Paper # | SP2018-18 |
Volume (vol) | vol.118 |
Number (no) | SP-160 |
Page | pp.pp.9-13(SP), |
#Pages | 5 |
Date of Issue | 2018-07-19 (SP) |