Achievement Award 2014

Achievement Award

Pioneering research on speech recognition based on stochastic models

Seiichi Nakagawa

  Speech recognition technology is a very familiar in daily life such as an input method for smart phones. Almost all the current speech recognition is based on stochastic models called Hidden Markov Models (HMMs). The award winner has been engaged in the fundamental research and development of speech recognition technology based on HMMs.
  In 1970’s, he proposed various continuous speech recognition algorithms based on Dynamic Time Warping such as parallel tree search (which includes the conventional beam search proposed at nearly the same time), word spotting, and context-free grammar-driven continuous speech recognition, and developed the first Japanese spoken language understanding system, LITHAN, based on these speech recognition algorithms. In 1980’s, he also proposed more advanced continuous speech recognition algorithms including O(n)DP (which is equivalent to one pass algorithm), augmented continuous DP matching, frame synchronous context-free grammar-driven recognition, stochastic DTW, and backward kakari-uke parsing algorithm. Then he finally developed a continuous speech recognition system based on HMMs, SPOJUS. This series of the proposals and developments has led the research community all over the world. He also has developed many kinds of spoken dialogue systems using these speech recognition techniques.
  Since the 1990’s, he achieved remarkable results in the research of large-vocabulary continuous speech recognition based on probabilistic language models and HMMs, and refined SPOJUS. These fundamental technologies have been used in the commercial systems such as dictation systems working on PCs and real-time broadcast news captioning systems. In this way, he has made particularly significant contributions to the society.
  State-of-the-art speech recognizers are the combinations of HMMs and deep neural networks. He is a pioneer of this kind of research and has adopted a kind of neural network called Hidden Conditional Neural Fields (HCNFs). He also has done application researches energetically. He, for example, has done frontier researches on language education using spoken language processing techniques. He constructed and provided language learners’ speech databases. He also developed computer assisted language learning (CALL) systems. He also made pioneering researches on spoken document processing, in which multimedia data including speech are processed to be retrieved and summarized. He received best paper awards for early spoken dialogue understanding study (1977) and recent study on HCNF-based speech recognition (2012) from IEICE. Furthermore, he also received another best paper award from IEICE for a survey on the trends of speech recognition researches (2010). Not only the vast coverage and structured explanation but also his unique investigation and future prospective of speech recognition research from the viewpoint of information theory were highly evaluated.

Fig. 1　　Relationship between phoneme recognition rate and word/sentence recognition rates (Modified version of a figure from literature (10). Each word is assumed to consist of six phonemes in average.) From this figure, the word recognition rates of state-of-the-art large vocabulary continuous speech recognizers are estimated approximately 88-95 %, because the word perplexities (entropy-th power of 2) of current language models rage 50-200 and the phoneme recognition rates are approximately 80 % without any language constraints.

The academic and social contributions of his works opening not only the fundamental but also the application research fields of stochastic speech recognition are appropriate for the achievement award of IEICE.

References

（1）T. Sakai, S. Nakagawa, “A Speech Understanding System of Simple Japanese Sentences in a Task Domain,” IEICE Trans., Vol.60-E, No.1, Feb., pp.13-20, 1977
（2）S. Nakagawa, “A connected spoken word or syllable recognition algorithm by pattern matching,” IEICE Trans., Vol.66-D, No.6, pp.637-644, Jun., 1983 (in Japanese).
（3）S. Nakagawa, “A connected spoken word recognition algorithm by augmented continuous DP matching,” IEICE Trans., Vol.67-D, No.10 , pp.1242-1249, Oct., 1984 (in Japanese).
（4）S. Nakagawa, “Continuous speech recognition by time-synchronous parsing algorithm of context-free grammar,” IEICE Trans., Vol.70-D, No.5 , pp.907-816, May, 1987 (in Japanese).
（5）T. Ito, S. Nakagawa, “Recognition of spoken Japanese sentences using mono-syllable units and backward kakari-uke parsing algorithm,” IEICE Trans., Vol.70-D, No.12, pp.2469-2478, Dec., 1987 (in Japanese).
（6）S . Nakagawa, H . Nakanishi, “Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method,” Journal of IETE, Vol.34, No.1, pp.87-95, Jan., 1989.
（7）S. Nakagawa, Y. Ohgurao, Y. Hashimoto, “Syntax oriented spoken Japanese recognition/understanding system　-SPOJUS-SYNO-,” IEICE Trans., Vol.72-DⅡ, No.8 ,pp.1276-1283, Aug., 1989 (in Japanese).
（8）Min Zhou and Seiichi NAKAGAWA, “Succeeding Word Prediction for Speech Recognition Based on Stochastic Language Model,” IEICE Trans., Vol.E79-D, No.4, pp.333-342, Apr., 1996.
（9）Y. Fujii, K. Yamamoto, S. Nakagawa, “Hidden conditional neural fields for continuous phoneme recognition,” IEICE Trans. Inf. & Syst., Vol.E95-D, No.8, pp.2094-2104, Aug., 2012.
（10）S. Nakagawa, “A survey on automatic speech recognition,” IEICE Trans., Vol.J83-D II, No.2,
pp.433-457, Feb., 2000 (in Japanese).