Presentation 2021-07-16
Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN
Takuya Chiba, Hiroki Matsuzaki, Naofumi Wada, Megumi Takezawa, Hirofumi Sanada,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We are attempting to use machine learning to vocal tract shape from speaking voice. For this purpose, we have used the vocal tract area function as the output and the vocal tract transfer function as the input as the training data, and have attempted inverse estimation using a neural network consisting of multiple fully connected layers, but have not been able to obtain sufficient estimation accuracy. Another problem was that the voice data itself was not used for training. In this study, we used a convolutional neural network (CNN), which has been widely used in image processing, as the input data to obtain a sound spectrogram from speaking voice. InceptionV3, VGG16, and ResNet50, which are often used in classification problems, were used as CNNs after changing the activation function used in the output layer from a softmax function to an equality function to fit the regression problem of this study. As a result, we were not able to obtain high accuracy with this implementation method for any of the CNN models.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Sound Spectrogrum / Vocal Tract Area / Inverse Estimation / CNN
Paper # EA2021-19
Date of Issue 2021-07-08 (EA)

Conference Information
Committee EA / ASJ-H
Conference Date 2021/7/15(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Engineering/Electro Acoustics, Psychological and Physiological Acoustics, Speech, Musical Acoustics, Education in Acoustics, and Related Topics
Chair Yoshinobu Kajikawa(Kansai Univ.)
Vice Chair Kenichi Furuya(Oita Univ.) / Shoichi Koyama(Univ. of Tokyo)
Secretary Kenichi Furuya(NTT) / Shoichi Koyama(RitsumeikanUniv.)
Assistant Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE)

Paper Information
Registration To Technical Committee on Engineering Acoustics / Auditory Research Meeting
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN
Sub Title (in English)
Keyword(1) Sound Spectrogrum
Keyword(2) Vocal Tract Area
Keyword(3) Inverse Estimation
Keyword(4) CNN
1st Author's Name Takuya Chiba
1st Author's Affiliation Hokkaido University of Science(Hokkaido Univ of Science)
2nd Author's Name Hiroki Matsuzaki
2nd Author's Affiliation Hokkaido University of Science(Hokkaido Univ of Science)
3rd Author's Name Naofumi Wada
3rd Author's Affiliation Hokkaido University of Science(Hokkaido Univ of Science)
4th Author's Name Megumi Takezawa
4th Author's Affiliation Hokkaido University of Science(Hokkaido Univ of Science)
5th Author's Name Hirofumi Sanada
5th Author's Affiliation Hokkaido University of Science(Hokkaido Univ of Science)
Date 2021-07-16
Paper # EA2021-19
Volume (vol) vol.121
Number (no) EA-112
Page pp.pp.89-94(EA),
#Pages 6
Date of Issue 2021-07-08 (EA)