［招待講演］音声情報抽出に有効な聴覚表現：理論・測定・推定・応用

入野 俊夫

Presentation	2023-11-23 [Invited Talk] Auditory representation effective for extracting speech information: Theory, measurement, estimation, and applications Toshio Irino,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Just by listening to the voice on a telephone, we can immediately tell whether the caller is an adult or a child, and we can estimate the speaker's height (size). At the same time, we can recognize the content of the speech regardless of the size of the speaker. According to the source-filter theory, speech sounds are generated by combining information about the shape of the vocal tract (filter characteristics) and the vibration of the vocal folds (source characteristics).Based on the speech chain theory, the auditory system could be assumed to solve the inverse problem. To model the mechanism, we proposed the Stabilized Wavelet-Mellin Transform (SWMT). As a background, we present the results of size perception experiments and show that the results cannot be explained by peripheral auditory models alone and that the spectral weight function, SSI weight, derived from the SWMT is effective. We also discuss how the SSI weight can improve the accuracy of vocal tract length estimates obtained from MRI measurements. Furthermore, we discuss the theoretical optimality of the SWMT and the gammachirp auditory filter. Finally, we mention the experimental measurement and estimation of their characteristics.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Source-filter theory / Speech chain / Size perception / Stabilized Wavelet-Mellin Transform / Gammachirp filter
Paper #	EA2023-46,EMM2023-77
Date of Issue	2023-11-16 (EA, EMM)

Conference Information
Committee	EMM / EA / ASJ-H
Conference Date	2023/11/23(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)	[Beginners Session] Engineering/Electro Acoustics, Content Processing, Digital Watermarking, Psychological and Physiological Acoustics, and Related Topics
Chair	Michiharu Niimi(Kyushu Inst. of Tech.) / Junki Ono(Tokyo Metropolitan Univ.)
Vice Chair	Kotaro Sonoda(Nagasaki Univ.) / Hyunho Kang(NIT, Tokyo) / Takanobu Nishiura(RitsumeikanUniv.) / Yoshinobu Kajikawa(Kansai Univ.)
Secretary	Kotaro Sonoda(Hiroshima City Univ.) / Hyunho Kang(Osaka Inst. of Tech.) / Takanobu Nishiura(NTT) / Yoshinobu Kajikawa(Univ. of Tokyo)
Assistant	Naofumi Aoki(Hokkaido Univ.) / Kazuaki Nakamura(Tokyo Univ. of Science) / Masato Nakayama(OSU) / Kouhei Yatabe(TUAT)

Paper Information
Registration To	Technical Committee on Enriched MultiMedia / Technical Committee on Engineering Acoustics / Auditory Research Meeting
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Invited Talk] Auditory representation effective for extracting speech information: Theory, measurement, estimation, and applications
Sub Title (in English)
Keyword(1)	Source-filter theory
Keyword(2)	Speech chain
Keyword(3)	Size perception
Keyword(4)	Stabilized Wavelet-Mellin Transform
Keyword(5)	Gammachirp filter
1st Author's Name	Toshio Irino
1st Author's Affiliation	Wakayama University(Wakayama Univ.)
Date	2023-11-23
Paper #	EA2023-46,EMM2023-77
Volume (vol)	vol.123
Number (no)	EA-278,EMM-279
Page	pp.pp.98-103(EA), pp.98-103(EMM),
#Pages	6
Date of Issue	2023-11-16 (EA, EMM)