Achievement Award | ||||||
![]() |
||||||
![]() |
||||||
|
||||||
![]() |
||||||
|
||||||
![]() |
||||||
![]() |
||||||
A text-to-speech synthesis (TTS) technology that generates a speech waveform from an input text to effectively convey information from a system to a user has attracted considerable interest alongside the popularization of speech interfaces. To allow many people to use these interfaces in their daily life, it is required to develop a new TTS technology with a function to flexibly generate speech to convey emotion and emphasis according to various conditions in a similar manner to humans. The recipient started his pioneering work on statistical parametric speech synthesis in the mid-90s and proposed a new speech synthesis framework based on hidden Markov models (HMMs) [1]. He established this framework as the leading researcher in this field, has played a central role over many years, and has been responsible for outstanding achievements in the progress of speech synthesis technologies. In the HMM-based speech synthesis framework shown in Fig. 1, speech waveforms are parameterized into time sequences of some speech component parameters, such as spectral and excitation parameters, and they are modeled with HMMs. In synthesis, the speech parameters are directly generated from HMMs, and then a speech waveform is synthesized from them. The recipient has proposed the following core technologies: 1) a speech parameter generation algorithm from HMMs using dynamic features [2], 2) multispace probability distribution HMMs as a probabilistic model to handle fundamental frequency patterns represented as a time sequence of mixed discrete symbols and continuous values [3], and 3) a unified framework to simultaneously model various speech components, such as spectrum, excitation, and duration [4]. These technologies have made it possible to describe the speech synthesis process in a manner that can be supported mathematically. This framework has addressed several problems in speech synthesis, such as automatic voice building, portability to foreign languages, and implementation using limited computer resources. Moreover, it makes it possible to synthesize various types of speech, such as emotional speech and emphasized speech. Within this framework, the recipient has further proposed several new synthesis technologies, such as voice mimicking [5], voice mixing [6], and voice creation [7]. The recipient has managed the Blizzard Challenge as a worldwide campaign to evaluate TTS technologies since 2005 [8]. The HMM-based speech synthesis system developed by his group won the Challenge in 2005, indicating that the statistical parametric speech synthesis could generate high-quality speech in practice [9]. His continuous contributions to the Blizzard Challenge have resulted in recent great progress in speech synthesis technologies. Moreover, the recipient has developed free software, such as the gHMM-based speech synthesis system (HTS)h [10], ghts_engine APIh, gOpen JTalkh, and gSPTK.h The HTS has been downloaded more than 30,000 times and is now recognized as the de facto standard toolkit for speech synthesis. It has been widely used in various products, such as car navigation systems and cellular/smart phones, all over the world. As described above, the recipient has proposed the idea of directly using statistical models for speech synthesis, has promoted it to speech research communities throughout the world, and has demonstrated the great success of his developed technologies. He received several prestigious awards, including IEEE Fellow in 2014, ISCA Fellow in 2013, IPSJ Kiyasu Special Industrial Achievement Award in 2013, and the Prize for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology. These achievements are highly outstanding and truly deserving of the IEICE Achievement Award.
Fig. 1@Speech synthesis system based on hidden Markov model (HMM).
|
||||||
References | ||||||
|
||||||
![]() |
||||||
![]() |