Paper Abstract and Keywords |
Presentation |
2022-08-26 11:42
Study on Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network Li Kai (JAIST), Xugang Lu (NICT), Masato Akagi, Jianwu Dang (JAIST), Sheng Li (NICT), Unoki Masashi (JAIST) SIP2022-68 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Quantitatively revealing the relationship between speakers’ physiological structure and acoustic speech signals by considering the properties of resonance and antiresonance can help us to extract effective speaker discriminative information (SDI) from speech signals. The conventional quantification method based on F-ratio only considers the power of acoustic speech in each frequency band independently. We propose a novel frequency-wise attentional neural network to learn the nonlinear combined effect of the frequency components on speaker identity. The learned results indicate that antiresonance frequency induced by the nasal cavity is another essential factor
for speaker discrimination that the F-ratio method could not reveal. To further evaluate our findings, we designed a non-uniform subband processing strategy based on the learned results for speaker feature extraction and did automatic speaker verification (ASV). The ASV results confirmed that further emphasizing the spectral structure
around the antiresonance frequency region can enhance speaker discrimination. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
physiological feature / non-uniform filterbank / frequency-wise attention / data-driven feature / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 122, no. 165, SIP2022-68, pp. 97-102, Aug. 2022. |
Paper # |
SIP2022-68 |
Date of Issue |
2022-08-18 (SIP) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
SIP2022-68 |