Presentation | 2021-07-16 Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN Takuya Chiba, Hiroki Matsuzaki, Naofumi Wada, Megumi Takezawa, Hirofumi Sanada, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We are attempting to use machine learning to vocal tract shape from speaking voice. For this purpose, we have used the vocal tract area function as the output and the vocal tract transfer function as the input as the training data, and have attempted inverse estimation using a neural network consisting of multiple fully connected layers, but have not been able to obtain sufficient estimation accuracy. Another problem was that the voice data itself was not used for training. In this study, we used a convolutional neural network (CNN), which has been widely used in image processing, as the input data to obtain a sound spectrogram from speaking voice. InceptionV3, VGG16, and ResNet50, which are often used in classification problems, were used as CNNs after changing the activation function used in the output layer from a softmax function to an equality function to fit the regression problem of this study. As a result, we were not able to obtain high accuracy with this implementation method for any of the CNN models. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Sound Spectrogrum / Vocal Tract Area / Inverse Estimation / CNN |
Paper # | EA2021-19 |
Date of Issue | 2021-07-08 (EA) |
Conference Information | |
Committee | EA / ASJ-H |
---|---|
Conference Date | 2021/7/15(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Engineering/Electro Acoustics, Psychological and Physiological Acoustics, Speech, Musical Acoustics, Education in Acoustics, and Related Topics |
Chair | Yoshinobu Kajikawa(Kansai Univ.) |
Vice Chair | Kenichi Furuya(Oita Univ.) / Shoichi Koyama(Univ. of Tokyo) |
Secretary | Kenichi Furuya(NTT) / Shoichi Koyama(RitsumeikanUniv.) |
Assistant | Yukou Wakabayashi(Tokyo Metropolitan Univ.) / Tatsuya Komatsu(LINE) |
Paper Information | |
Registration To | Technical Committee on Engineering Acoustics / Auditory Research Meeting |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Inverse esitimaion of shapes of vocal-tract models with cascading two acoustic tubes from sound spectrogram using CNN |
Sub Title (in English) | |
Keyword(1) | Sound Spectrogrum |
Keyword(2) | Vocal Tract Area |
Keyword(3) | Inverse Estimation |
Keyword(4) | CNN |
1st Author's Name | Takuya Chiba |
1st Author's Affiliation | Hokkaido University of Science(Hokkaido Univ of Science) |
2nd Author's Name | Hiroki Matsuzaki |
2nd Author's Affiliation | Hokkaido University of Science(Hokkaido Univ of Science) |
3rd Author's Name | Naofumi Wada |
3rd Author's Affiliation | Hokkaido University of Science(Hokkaido Univ of Science) |
4th Author's Name | Megumi Takezawa |
4th Author's Affiliation | Hokkaido University of Science(Hokkaido Univ of Science) |
5th Author's Name | Hirofumi Sanada |
5th Author's Affiliation | Hokkaido University of Science(Hokkaido Univ of Science) |
Date | 2021-07-16 |
Paper # | EA2021-19 |
Volume (vol) | vol.121 |
Number (no) | EA-112 |
Page | pp.pp.89-94(EA), |
#Pages | 6 |
Date of Issue | 2021-07-08 (EA) |