統計的パラメトリック音声合成のためのFFTスペクトルからのDeep Auto-encoderに基づく低次元音響特徴量抽出

Presentation	2015-12-03 Deep Auto-encoder based Low-dimensional Feature Extraction using FFT Spectral Envelopes in Statistical Parametric Speech Synthesis Shinji Takaki, Junichi Yamagishi,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In the state-of-the-art statistical parametric speech synthesis system, a speech analysis module, e.g. STRAIGHT spectral analysis, is generally used for obtaining accurate and stable spectral envelopes, and then low-dimensional acoustic features extracted from obtained spectral envelopes are used for training acoustic models. However, a spectral envelope estimation algorithm used in such a speech analysis module includes various processing derived from human knowledge. In this paper, we investigate a deep auto-encoder based, non-linear, data-driven and unsupervised low-dimensional feature extraction using FFT spectral envelopes for statistical parametric speech synthesis. Experimental results have shown that a text-to-speech synthesis system using a deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes is indeed a promising approach.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Statistical parametric speech synthesis / DNN / Deep Auto-encoder / Spectral envelope / Vocoder
Paper #	SP2015-81
Date of Issue	2015-11-25 (SP)

Conference Information
Committee	NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date	2015/12/2(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Nagoya Inst of Tech.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium
Chair	Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大)
Vice Chair	Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.)
Secretary	Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.)
Assistant	Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Deep Auto-encoder based Low-dimensional Feature Extraction using FFT Spectral Envelopes in Statistical Parametric Speech Synthesis
Sub Title (in English)
Keyword(1)	Statistical parametric speech synthesis
Keyword(2)	DNN
Keyword(3)	Deep Auto-encoder
Keyword(4)	Spectral envelope
Keyword(5)	Vocoder
1st Author's Name	Shinji Takaki
1st Author's Affiliation	National Institute of Informatics(NII)
2nd Author's Name	Junichi Yamagishi
2nd Author's Affiliation	National Institute of Informatics(NII)
Date	2015-12-03
Paper #	SP2015-81
Volume (vol)	vol.115
Number (no)	SP-346
Page	pp.pp.99-104(SP),
#Pages	6
Date of Issue	2015-11-25 (SP)