Presentation | 2023-02-28 Multi-stream FC-HiFi-GAN:Fast Neural Vocoder Model Using Learnable Lightweight Upsampling Haruki Yamashita, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality. Multi-stream(MS) iSTFT-HiFiGAN was proposed as a high-speed model of HiFi-GAN, a vocoder capable of inferring waveforms on single CPU. In the TTS task using VITS, although there was some deterioration in sound quality, the speed was increased by about 4 times. In this paper, we propose a MS-FC-HiFi-GAN in which the inverse short-time Fourier transform (iSTFT) part is changed to trainable fully connected layer for the purpose of improving the synthesis quality of the MS-iSTFT-HiFiGAN. As for the inference speed, RTF was 0.15 on 1 CPU, which is the same as MS-iSTFT-HiFiGAN. Synthesis quality was inferior to that of MS-iSTFT-HiFiGAN in TTS task, but was superior to thatin analysis/synthesis task. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | speech synthesis / Neural Vocoder / HiFi-GAN / Text-to-Speech / Analysis Synthesis |
Paper # | EA2022-76,SIP2022-120,SP2022-40 |
Date of Issue | 2023-02-21 (EA, SIP, SP) |
Conference Information | |
Committee | SP / IPSJ-SLP / EA / SIP |
---|---|
Conference Date | 2023/2/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) / Kenichi Furuya(Oita Univ.) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.) |
Vice Chair | / / Tatsuya Kako(NTT) / Junki Ono(Tokyo Metropolitan Univ.) / Koichi Ichige(Yokohama National Univ.) / Takayuki Nakachi(Ryukyu Univ.) |
Secretary | (NTT) / (Univ. of Electro-Comm.) / Tatsuya Kako(NTT) / Junki Ono(Univ. of Electro-Comm.) / Koichi Ichige(NTT) / Takayuki Nakachi(RitsumeikanUniv.) |
Assistant | Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Masato Nakayama(Osaka Sangyo Univ.) / Kouhei Yatabe(Tuat) / Taichi Yoshida(UEC) / Shoko Imaizumi(Chiba Univ.) |
Paper Information | |
Registration To | Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Multi-stream FC-HiFi-GAN:Fast Neural Vocoder Model Using Learnable Lightweight Upsampling |
Sub Title (in English) | |
Keyword(1) | speech synthesis |
Keyword(2) | Neural Vocoder |
Keyword(3) | HiFi-GAN |
Keyword(4) | Text-to-Speech |
Keyword(5) | Analysis Synthesis |
1st Author's Name | Haruki Yamashita |
1st Author's Affiliation | Kobe University/National Institute of Information and Communications Technology(Kobe Univ/NICT) |
2nd Author's Name | Takuma Okamoto |
2nd Author's Affiliation | National Institute of Information and Communications Technology(NICT) |
3rd Author's Name | Ryoichi Takashima |
3rd Author's Affiliation | Kobe University(Kobe Univ) |
4th Author's Name | Tetsuya Takiguchi |
4th Author's Affiliation | Kobe University(Kobe Univ) |
5th Author's Name | Tomoki Toda |
5th Author's Affiliation | Nagoya University/National Institute of Information and Communications Technology(Nagoya Univ/NICT) |
6th Author's Name | Hisashi Kawai |
6th Author's Affiliation | National Institute of Information and Communications Technology(NICT) |
Date | 2023-02-28 |
Paper # | EA2022-76,SIP2022-120,SP2022-40 |
Volume (vol) | vol.122 |
Number (no) | EA-387,SIP-388,SP-389 |
Page | pp.pp.7-12(EA), pp.7-12(SIP), pp.7-12(SP), |
#Pages | 6 |
Date of Issue | 2023-02-21 (EA, SIP, SP) |