A Feasibility Study on CNN-LSTM Based Phrase Speech Recognition Compared with LSTM and HMM

Shingo Kato; Hiroshi Tsutsui

Summary

International Workshop on Smart Info-Media Systems in Asia

2021

Session Number:RS3

Session:

Number:RS3-7

A Feasibility Study on CNN-LSTM Based Phrase Speech Recognition Compared with LSTM and HMM

Shingo Kato, Hiroshi Tsutsui,

pp.154-157

Publication Date:2021/9/20

Online ISSN:2188-5079

DOI:10.34385/proc.66.RS3-7

PDF download

Summary:

In this paper, targeting phrase speech recognition, we present a CNN-LSTM model implementation, a combination of a convolutional neural network (CNN) and long short-term memory (LSTM). In conventional phrase speech recognition systems, hidden Markov models (HMMs) are widely used. This approach, however, has a problem that the amount of calculation increases linearly because the number of models increases as the number of registered phrases increases. Motivated by this, we are developing a phrase speech recognition system based on neural networks. In this paper, we focus on the CNN-LSTM model, which takes advantage of the characteristics of both CNN and LSTM. Using the CNN-LSTM model, it is possible to take into account both the local features in the speech data and the long-term dependence of the entire speech data, which leads to a recognition accuracy improvement. The experimental results compared with LSTM and HMM show that the increase of the computational cost when the number of phrases is large can be suppressed using LSTM or CNN-LSTM and that the CNN-LSTM model has robustness against noises.