Presentation | 2018-06-28 Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we aim at improving the speech quality in voice conversion and propose a novel multi-modal voice conversion approach using speech waveforms and lip images. We employ deep bottleneck features to improve visual features in audio-visual voice conversion. In addition, we also apply deep canonical correlation analysis to obtain much better audio and visual representations, as well as to build a new cross-modal framework. We conducted subjective and objective evaluations in noisy environments to clarify usefulness of our proposed method, comparing to audio-only, visual-only and conventional audio-visual voice conversion schemes. We then found our method can significantly improve the quality even in heavily noisy conditions. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Voice conversion / multi-modal / audio-visual / cross-modal / deep learning / bottleneck feature / canonical component analysis |
Paper # | PRMU2018-24,SP2018-4 |
Date of Issue | 2018-06-21 (PRMU, SP) |
Conference Information | |
Committee | PRMU / SP |
---|---|
Conference Date | 2018/6/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.) |
Assistant | Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis |
Sub Title (in English) | |
Keyword(1) | Voice conversion |
Keyword(2) | multi-modal |
Keyword(3) | audio-visual |
Keyword(4) | cross-modal |
Keyword(5) | deep learning |
Keyword(6) | bottleneck feature |
Keyword(7) | canonical component analysis |
1st Author's Name | Satoshi Tamura |
1st Author's Affiliation | Gifu University(Gifu Univ.) |
2nd Author's Name | Kento Horio |
2nd Author's Affiliation | Gifu University(Gifu Univ.) |
3rd Author's Name | Hajime Endo |
3rd Author's Affiliation | Gifu University(Gifu Univ.) |
4th Author's Name | Satoru Hayamizu |
4th Author's Affiliation | Gifu University(Gifu Univ.) |
5th Author's Name | Tomoki Toda |
5th Author's Affiliation | Nagoya University(Nagoya Univ.) |
Date | 2018-06-28 |
Paper # | PRMU2018-24,SP2018-4 |
Volume (vol) | vol.118 |
Number (no) | PRMU-111,SP-112 |
Page | pp.pp.13-18(PRMU), pp.13-18(SP), |
#Pages | 6 |
Date of Issue | 2018-06-21 (PRMU, SP) |