Presentation 2017-06-22
Postfiltering of STFT Spectrograms Based on Generative Adversarial Networks
Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper presents postfiltering of short-term Fourier transform (STFT) spectrograms based on Generative Adversarial Networks (GANs). The STFT spectrograms have been widely used as key acoustic representations in the field of speech processing, such as speech synthesis, voice conversion, speech enhancement, and speech separation. In these tasks, the normal goal is to precisely predict or generate the representations from inputs; however, the quality of generated spectra is typically degraded by over-smoothing. To solve this problem, we propose postfiltering based on GANs, which make it possible to generate random samples following the underlying data distribution without the need for the explicit form of its density. As it is not easy for a GAN to be trained for very high-dimensional data such as the STFT spectra, we use a simple divide-and-concatenate approach, where we divide a spectrogram into multiple bands, reconstruct the individual bands using the GAN-based postfilter trained for each one, and concatenate them. We tested our postfilter on a deep neural network-based text-to-speech task and confirmed that the use of our postfilter had a certain effect in reducing the gap between synthesized and target spectra, even in the high-dimensional STFT domain.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) postfilter / deep neural network / generative adversarial network / statistic parametric speech synthesis
Paper # PRMU2017-28,SP2017-4
Date of Issue 2017-06-15 (PRMU, SP)

Conference Information
Committee PRMU / SP
Conference Date 2017/6/22(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Hironobu Fujiyoshi(Chubu Univ.) / Yoshihisa Ijiri(Omron) / Hiroki Mori(Utsunomiya Univ.)
Secretary Hironobu Fujiyoshi(AIST) / Yoshihisa Ijiri(NAIST) / Hiroki Mori(Shizuoka Univ.)
Assistant Masato Ishii(NEC) / Yusuke Sugano(Osaka Univ.) / Kei Hashimoto(Nagoya Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Postfiltering of STFT Spectrograms Based on Generative Adversarial Networks
Sub Title (in English)
Keyword(1) postfilter
Keyword(2) deep neural network
Keyword(3) generative adversarial network
Keyword(4) statistic parametric speech synthesis
1st Author's Name Takuhiro Kaneko
1st Author's Affiliation NTT Corporation(NTT)
2nd Author's Name Shinji Takaki
2nd Author's Affiliation National Institute of Informatics(NII)
3rd Author's Name Hirokazu Kameoka
3rd Author's Affiliation NTT Corporation(NTT)
4th Author's Name Junichi Yamagishi
4th Author's Affiliation National Institute of Informatics(NII)
Date 2017-06-22
Paper # PRMU2017-28,SP2017-4
Volume (vol) vol.117
Number (no) PRMU-105,SP-106
Page pp.pp.17-22(PRMU), pp.17-22(SP),
#Pages 6
Date of Issue 2017-06-15 (PRMU, SP)