Best Paper Award

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials[IEICE TRANS. INF. & SYST., VOL.E104–D, NO.7 JULY 2021]

Takaaki SAEKI
Takaaki SAEKI
Yuki SAITO
Yuki SAITO
Shinnosuke TAKAMICHI
Shinnosuke TAKAMICHI
Hiroshi SARUWATARI
Hiroshi SARUWATARI

Voice conversion is a technique to convert source speech into target speech that sounds like another person's voice. Recently, by introducing deep neural networks (DNNs), the sound quality of the converted speech becomes significantly higher. This leads to an increase in practical applications in the content industry. One example is "AI voice changers" that are used by VTubers. This technique has potential to be used in the metaverse space to realize anonymous speech communication with a user's favored voice.

This paper proposes a novel voice conversion method. The proposed method achieves a real-time full-band voice conversion by combining signal-processing methods and DNN-based methods. The basic structure of the proposed method is to apply conversion filters to the input speech. The conversion filters are constructed from real differential cepstrums that are estimated by the DNNs. The main features of the proposed technique include a fast conversion method by truncation of the differential filter taps, a phase reconstruction method with a lifter training, and a low frequency-band conversion method using sub-band modeling. Additionally, this paper gives investigations of various methods to enhance the core methods, such as F0 equalization, vocoder-guided training, and statistical compensation based on global variance. Evaluations are well organized to investigate the effectiveness of these respective methods. The reliability of the paper presented with a detailed evaluation of the results is one of the remarkable features of this paper. With the expectations of improvements and practical applications in the future, this paper can be considered to represent the papers in the fields of voice conversion and speech synthesis research submitted to the IEICE in FY2021.

As mentioned above, the voice conversion technique is a timely research field that has expectations of growth in the near future and a relatively active paper submission. Among them, this paper presents the author’s considerable efforts to provide a novel method with high originality and practicality that is 15 pages long. This paper is worthy to receive the IEICE Best Paper Award.