Achievement Award
Pioneering work on speech synthesis technologies based on statistical models
Keiichi Tokuda

Keiichi Tokuda
  A text-to-speech synthesis (TTS) technology that generates a speech waveform from an input text to effectively convey information from a system to a user has attracted considerable interest alongside the popularization of speech interfaces. To allow many people to use these interfaces in their daily life, it is required to develop a new TTS technology with a function to flexibly generate speech to convey emotion and emphasis according to various conditions in a similar manner to humans.
  The recipient started his pioneering work on statistical parametric speech synthesis in the mid-90s and proposed a new speech synthesis framework based on hidden Markov models (HMMs) [1]. He established this framework as the leading researcher in this field, has played a central role over many years, and has been responsible for outstanding achievements in the progress of speech synthesis technologies.
  In the HMM-based speech synthesis framework shown in Fig. 1, speech waveforms are parameterized into time sequences of some speech component parameters, such as spectral and excitation parameters, and they are modeled with HMMs. In synthesis, the speech parameters are directly generated from HMMs, and then a speech waveform is synthesized from them. The recipient has proposed the following core technologies: 1) a speech parameter generation algorithm from HMMs using dynamic features [2], 2) multispace probability distribution HMMs as a probabilistic model to handle fundamental frequency patterns represented as a time sequence of mixed discrete symbols and continuous values [3], and 3) a unified framework to simultaneously model various speech components, such as spectrum, excitation, and duration [4].
These technologies have made it possible to describe the speech synthesis process in a manner that can be supported mathematically.
  This framework has addressed several problems in speech synthesis, such as automatic voice building, portability to foreign languages, and implementation using limited computer resources. Moreover, it makes it possible to synthesize various types of speech, such as emotional speech and emphasized speech. Within this framework, the recipient has further proposed several new synthesis technologies, such as voice mimicking [5], voice mixing [6], and voice creation [7].
  The recipient has managed the Blizzard Challenge as a worldwide campaign to evaluate TTS technologies since 2005 [8]. The HMM-based speech synthesis system developed by his group won the Challenge in 2005, indicating that the statistical parametric speech synthesis could generate high-quality speech in practice [9]. His continuous contributions to the Blizzard Challenge have resulted in recent great progress in speech synthesis technologies.
  Moreover, the recipient has developed free software, such as the gHMM-based speech synthesis system (HTS)h [10], ghts_engine APIh, gOpen JTalkh, and gSPTK.h The HTS has been downloaded more than 30,000 times and is now recognized as the de facto standard toolkit for speech synthesis. It has been widely used in various products, such as car navigation systems and cellular/smart phones, all over the world.
  As described above, the recipient has proposed the idea of directly using statistical models for speech synthesis, has promoted it to speech research communities throughout the world, and has demonstrated the great success of his developed technologies. He received several prestigious awards, including IEEE Fellow in 2014, ISCA Fellow in 2013, IPSJ Kiyasu Special Industrial Achievement Award in 2013, and the Prize for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology. These achievements are highly outstanding and truly deserving of the IEICE Achievement Award.

Fig. 1@Speech synthesis system based on hidden Markov model (HMM).

  1. i1jKeiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, Keiichiro Oura, gSpeech synthesis based on hidden Markov models,h Proceedings of the IEEE, Vol. 101, No. 5, pp. 1234|1252, May 2013.
  2. i2jKeiichi Tokuda, Takashi Masuko, Takao Kobayashi, Satoshi Imai, gAn algorithm for speech parameter generation from HMM using dynamic features,h The Journal of The Acoustical Society of Japan (in Japanese), Vol. 53, No. 3, pp. 192|200, Mar. 1997.
  3. i3jKeiichi Tokuda, Takashi Masuko, Noboru Miyazaki, Takao Kobayashi, gMulti-space probability distribution HMM,h IEICE Transactions on Information and Systems, Vol. E85-D, No. 3, pp.455\464, Mar. 2002.
  4. i4jTakayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," IEICE Transactions on Information and Systems (D-II) (in Japanese), Vol. J83-D-II, No. 11, pp. 2099|2107, Nov. 2000.
  5. i5jJunichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, gA training method of average voice model for HMM-based speech synthesis,h IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, E86-A, Vol.8, pp. 1956|1963, Aug. 2003.
  6. i6jTakayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, K. Kitamura, gSpeaker interpolation for HMM-based speech synthesis system,h The Journal of the Acoustical Society of Japan (E), Vol. 21, No.4, pp. 199-206, Apr. 2000.
  7. i7jKengo Shichiri, Atsushi Sawabe, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura, gEigenvoices for HMM-based speech synthesis,h Proceedings of INTERSPEECH, pp.1269-1272, Denver, USA, Sept. 2002.
  8. i8jKeiichi Tokuda, Alan W. Black, gSpeech synthesis research in a new age of cooperation and competition: The Blizzard Challenge,h The Journal of The Acoustical Society of Japan (in Japanese), Vol. 62, No. 6, pp. 466|472, June 2006.
  9. i9jHeiga Zen, Tomoki Toda, Masaru Nakamura, Keiichi Tokuda, gDetails of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005,h IEICE Transactions on Information and Systems, Vol. E90-D, No. 1, pp. 325-333, Jan. 2007.
  10. i10j