Presentation 2018-06-28
Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis
Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we aim at improving the speech quality in voice conversion and propose a novel multi-modal voice conversion approach using speech waveforms and lip images. We employ deep bottleneck features to improve visual features in audio-visual voice conversion. In addition, we also apply deep canonical correlation analysis to obtain much better audio and visual representations, as well as to build a new cross-modal framework. We conducted subjective and objective evaluations in noisy environments to clarify usefulness of our proposed method, comparing to audio-only, visual-only and conventional audio-visual voice conversion schemes. We then found our method can significantly improve the quality even in heavily noisy conditions.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Voice conversion / multi-modal / audio-visual / cross-modal / deep learning / bottleneck feature / canonical component analysis
Paper # PRMU2018-24,SP2018-4
Date of Issue 2018-06-21 (PRMU, SP)

Conference Information
Committee PRMU / SP
Conference Date 2018/6/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.)
Assistant Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis
Sub Title (in English)
Keyword(1) Voice conversion
Keyword(2) multi-modal
Keyword(3) audio-visual
Keyword(4) cross-modal
Keyword(5) deep learning
Keyword(6) bottleneck feature
Keyword(7) canonical component analysis
1st Author's Name Satoshi Tamura
1st Author's Affiliation Gifu University(Gifu Univ.)
2nd Author's Name Kento Horio
2nd Author's Affiliation Gifu University(Gifu Univ.)
3rd Author's Name Hajime Endo
3rd Author's Affiliation Gifu University(Gifu Univ.)
4th Author's Name Satoru Hayamizu
4th Author's Affiliation Gifu University(Gifu Univ.)
5th Author's Name Tomoki Toda
5th Author's Affiliation Nagoya University(Nagoya Univ.)
Date 2018-06-28
Paper # PRMU2018-24,SP2018-4
Volume (vol) vol.118
Number (no) PRMU-111,SP-112
Page pp.pp.13-18(PRMU), pp.13-18(SP),
#Pages 6
Date of Issue 2018-06-21 (PRMU, SP)