Achievement Award | ||||||
|
||||||
|
||||||
Speech recognition technology is a very familiar in daily life such as an input method for smart phones. Almost all the current speech recognition is based on stochastic models called Hidden Markov Models (HMMs). The award winner has been engaged in the fundamental research and development of speech recognition technology based on HMMs. In 1970fs, he proposed various continuous speech recognition algorithms based on Dynamic Time Warping such as parallel tree search (which includes the conventional beam search proposed at nearly the same time), word spotting, and context-free grammar-driven continuous speech recognition, and developed the first Japanese spoken language understanding system, LITHAN, based on these speech recognition algorithms. In 1980fs, he also proposed more advanced continuous speech recognition algorithms including O(n)DP (which is equivalent to one pass algorithm), augmented continuous DP matching, frame synchronous context-free grammar-driven recognition, stochastic DTW, and backward kakari-uke parsing algorithm. Then he finally developed a continuous speech recognition system based on HMMs, SPOJUS. This series of the proposals and developments has led the research community all over the world. He also has developed many kinds of spoken dialogue systems using these speech recognition techniques. Since the 1990fs, he achieved remarkable results in the research of large-vocabulary continuous speech recognition based on probabilistic language models and HMMs, and refined SPOJUS. These fundamental technologies have been used in the commercial systems such as dictation systems working on PCs and real-time broadcast news captioning systems. In this way, he has made particularly significant contributions to the society. State-of-the-art speech recognizers are the combinations of HMMs and deep neural networks. He is a pioneer of this kind of research and has adopted a kind of neural network called Hidden Conditional Neural Fields (HCNFs). He also has done application researches energetically. He, for example, has done frontier researches on language education using spoken language processing techniques. He constructed and provided language learnersf speech databases. He also developed computer assisted language learning (CALL) systems. He also made pioneering researches on spoken document processing, in which multimedia data including speech are processed to be retrieved and summarized. He received best paper awards for early spoken dialogue understanding study (1977) and recent study on HCNF-based speech recognition (2012) from IEICE. Furthermore, he also received another best paper award from IEICE for a survey on the trends of speech recognition researches (2010). Not only the vast coverage and structured explanation but also his unique investigation and future prospective of speech recognition research from the viewpoint of information theory were highly evaluated.
Fig. 1@@Relationship between phoneme recognition rate and word/sentence recognition rates (Modified version of a figure from literature (10). Each word is assumed to consist of six phonemes in average.) From this figure, the word recognition rates of state-of-the-art large vocabulary continuous speech recognizers are estimated approximately 88-95 %, because the word perplexities (entropy-th power of 2) of current language models rage 50-200 and the phoneme recognition rates are approximately 80 % without any language constraints.
The academic and social contributions of his works opening not only the fundamental but also the application research fields of stochastic speech recognition are appropriate for the achievement award of IEICE. |
||||||
References | ||||||
|
||||||