二つの音響管が縦続接続された簡易声道モデルのサウンドスペクトログラムからのCNNを用いた形状逆推定

千葉 拓弥; 松﨑 博季; 和田 直史; 竹沢 恵; 真田 博文

Presentation	2021-07-16 Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN Takuya Chiba, Hiroki Matsuzaki, Naofumi Wada, Megumi Takezawa, Hirofumi Sanada,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	We are attempting to use machine learning to vocal tract shape from speaking voice. For this purpose, we have used the vocal tract area function as the output and the vocal tract transfer function as the input as the training data, and have attempted inverse estimation using a neural network consisting of multiple fully connected layers, but have not been able to obtain sufficient estimation accuracy. Another problem was that the voice data itself was not used for training. In this study, we used a convolutional neural network (CNN), which has been widely used in image processing, as the input data to obtain a sound spectrogram from speaking voice. InceptionV3, VGG16, and ResNet50, which are often used in classification problems, were used as CNNs after changing the activation function used in the output layer from a softmax function to an equality function to fit the regression problem of this study. As a result, we were not able to obtain high accuracy with this implementation method for any of the CNN models.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Sound Spectrogrum / Vocal Tract Area / Inverse Estimation / CNN
Paper #	EA2021-19
Date of Issue	2021-07-08 (EA)

Conference Information
Committee	EA / ASJ-H
Conference Date	2021/7/15(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Engineering/Electro Acoustics, Psychological and Physiological Acoustics, Speech, Musical Acoustics, Education in Acoustics, and Related Topics
Chair	Yoshinobu Kajikawa(Kansai Univ.)
Vice Chair	Kenichi Furuya(Oita Univ.) / Shoichi Koyama(Univ. of Tokyo)
Secretary	Kenichi Furuya(NTT) / Shoichi Koyama(RitsumeikanUniv.)
Assistant	Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE)

Paper Information
Registration To	Technical Committee on Engineering Acoustics / Auditory Research Meeting
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN
Sub Title (in English)
Keyword(1)	Sound Spectrogrum
Keyword(2)	Vocal Tract Area
Keyword(3)	Inverse Estimation
Keyword(4)	CNN
1st Author's Name	Takuya Chiba
1st Author's Affiliation	Hokkaido University of Science(Hokkaido Univ of Science)
2nd Author's Name	Hiroki Matsuzaki
2nd Author's Affiliation	Hokkaido University of Science(Hokkaido Univ of Science)
3rd Author's Name	Naofumi Wada
3rd Author's Affiliation	Hokkaido University of Science(Hokkaido Univ of Science)
4th Author's Name	Megumi Takezawa
4th Author's Affiliation	Hokkaido University of Science(Hokkaido Univ of Science)
5th Author's Name	Hirofumi Sanada
5th Author's Affiliation	Hokkaido University of Science(Hokkaido Univ of Science)
Date	2021-07-16
Paper #	EA2021-19
Volume (vol)	vol.121
Number (no)	EA-112
Page	pp.pp.89-94(EA),
#Pages	6
Date of Issue	2021-07-08 (EA)