Robust Speech Recognition Based on Running Speech Spectrum on Critical Band Intensity

Nongnuch SUKTANGMAN; Kraisin SONGWATANA; Yoshikazu MIYANAGA

Summary

2007 International Symposium on Nonlinear Theory and its Applications

2007

Session Number:19AM2-C

Session:

Number:19AM2-C-2

Robust Speech Recognition Based on Running Speech Spectrum on Critical Band Intensity

Nongnuch SUKTANGMAN, Kraisin SONGWATANA, Yoshikazu MIYANAGA,

pp.437-440

Publication Date:2007/9/16

Online ISSN:2188-5079

DOI:10.34385/proc.41.19AM2-C-2

PDF download (70.5KB)

Summary:

In this report, we introduce the new results of robust automatic speech recognition (ASR) based on features of speech spectrum on Bark scale. The robustness is improved by adding running spectrum filtering (RSF) techniques and dynamic range adjustment (DRA) to the features. Bark scale is a psychoacoustics measurement on human hearing property and speech features extraction processes consists of four steps: (1) auto-regressive model (AR model), (2) critical band intensity (CBI), (3) logarithm CBI into discrete cosines transform, DCT ( log(CBI) ). The detailed information feature is extracted by RSF and DRA from DCT spectra of log CBI sixteen dimension parameters vectors, sixteen dimension parameters of delta, parameter voice energy and a parameter of delta energy. In ASR, the utterance signal-to-noise ratio (SNR) for the speech signal is first extracted speech features for recognition and decoded via acoustic hidden Markov models (HMMs) trained with clean data. We explore the noise robust property of the total system and thus several noise circumstances were considered 0 dB SNR to 20 dB. The recognition rates are improved in our experiments by above 27% at 0 dB SNR, 30% at 10 dB SNR and 7% at 20 dB SNR.