Presentation 2023-11-23
[Invited Talk] Auditory representation effective for extracting speech information: Theory, measurement, estimation, and applications
Toshio Irino,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Just by listening to the voice on a telephone, we can immediately tell whether the caller is an adult or a child, and we can estimate the speaker's height (size). At the same time, we can recognize the content of the speech regardless of the size of the speaker. According to the source-filter theory, speech sounds are generated by combining information about the shape of the vocal tract (filter characteristics) and the vibration of the vocal folds (source characteristics).Based on the speech chain theory, the auditory system could be assumed to solve the inverse problem. To model the mechanism, we proposed the Stabilized Wavelet-Mellin Transform (SWMT). As a background, we present the results of size perception experiments and show that the results cannot be explained by peripheral auditory models alone and that the spectral weight function, SSI weight, derived from the SWMT is effective. We also discuss how the SSI weight can improve the accuracy of vocal tract length estimates obtained from MRI measurements. Furthermore, we discuss the theoretical optimality of the SWMT and the gammachirp auditory filter. Finally, we mention the experimental measurement and estimation of their characteristics.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Source-filter theory / Speech chain / Size perception / Stabilized Wavelet-Mellin Transform / Gammachirp filter
Paper # EA2023-46,EMM2023-77
Date of Issue 2023-11-16 (EA, EMM)

Conference Information
Committee EMM / EA / ASJ-H
Conference Date 2023/11/23(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English) [Beginners Session] Engineering/Electro Acoustics, Content Processing, Digital Watermarking, Psychological and Physiological Acoustics, and Related Topics
Chair Michiharu Niimi(Kyushu Inst. of Tech.) / Junki Ono(Tokyo Metropolitan Univ.)
Vice Chair Kotaro Sonoda(Nagasaki Univ.) / Hyunho Kang(NIT, Tokyo) / Takanobu Nishiura(RitsumeikanUniv.) / Yoshinobu Kajikawa(Kansai Univ.)
Secretary Kotaro Sonoda(Hiroshima City Univ.) / Hyunho Kang(Osaka Inst. of Tech.) / Takanobu Nishiura(NTT) / Yoshinobu Kajikawa(Univ. of Tokyo)
Assistant Naofumi Aoki(Hokkaido Univ.) / Kazuaki Nakamura(Tokyo Univ. of Science) / Masato Nakayama(OSU) / Kouhei Yatabe(TUAT)

Paper Information
Registration To Technical Committee on Enriched MultiMedia / Technical Committee on Engineering Acoustics / Auditory Research Meeting
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) [Invited Talk] Auditory representation effective for extracting speech information: Theory, measurement, estimation, and applications
Sub Title (in English)
Keyword(1) Source-filter theory
Keyword(2) Speech chain
Keyword(3) Size perception
Keyword(4) Stabilized Wavelet-Mellin Transform
Keyword(5) Gammachirp filter
1st Author's Name Toshio Irino
1st Author's Affiliation Wakayama University(Wakayama Univ.)
Date 2023-11-23
Paper # EA2023-46,EMM2023-77
Volume (vol) vol.123
Number (no) EA-278,EMM-279
Page pp.pp.98-103(EA), pp.98-103(EMM),
#Pages 6
Date of Issue 2023-11-16 (EA, EMM)