Presentation 2011-06-23
Speech recognition in mixed sound of speech and music by vector quantization and non-negative matrix factorization
Shoichi NAKANO, Kazumasa YAMAMOTO, Seiichi NAKAGAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) For speech recognition in the presence of noise, it is necessary to reduce the effect of the noise. The spectral subtraction and Wiener filter based methods are general techniques for noise removal. Although these methods are valid for stationary noise, they are not effective for non-stationary noise. This paper describes a speech recognition method for mixed sound, consisting of speech and music, that removes the music only based on vector quantization and non-negative matrix factorization. For isolated word recognition using the clean speech model, an improvement of about 15% was obtained compared with the case of not removing music. Furthermore, a high recognition rate of about 90% was achieved, even under the 0 dB condition using a model trained from the mixed sound after removing the music according. We also applied the proposed method to piano trio, and confirmed the effectiveness. Finally, we also compared the human performance by listening test and machine recognition performance.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) speech recognition / mixed sound / music removal / piano trio / vector quantization / non-negative matrix factorization
Paper # SP2011-34
Date of Issue

Conference Information
Committee SP
Conference Date 2011/6/16(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Speech recognition in mixed sound of speech and music by vector quantization and non-negative matrix factorization
Sub Title (in English)
Keyword(1) speech recognition
Keyword(2) mixed sound
Keyword(3) music removal
Keyword(4) piano trio
Keyword(5) vector quantization
Keyword(6) non-negative matrix factorization
1st Author's Name Shoichi NAKANO
1st Author's Affiliation Toyohashi University of Technology()
2nd Author's Name Kazumasa YAMAMOTO
2nd Author's Affiliation Toyohashi University of Technology
3rd Author's Name Seiichi NAKAGAWA
3rd Author's Affiliation Toyohashi University of Technology
Date 2011-06-23
Paper # SP2011-34
Volume (vol) vol.111
Number (no) 97
Page pp.pp.-
#Pages 6
Date of Issue