Paper Abstract and Keywords |
Presentation |
2010-01-22 14:50
Multimodal speech recognition using multimodal voice activity detection Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu (Gifu Univ.) CQ2009-105 PRMU2009-204 SP2009-145 MVE2009-127 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Audio-Visual Automatic Speech Recognition (AVASR) has been developed to enhance the robustness in noisy environments, using visual information in addition to acoustic features. Similarly, Audio-Visual Voice Activity Detection (AVVAD) has been investigated and used to increase the precision of VAD, since detecting presence of speech in noisy audio signals contributes ASR performance. In this paper, we propose a novel speech recognition method combining AVASR and AVVAD: combinations of model-based and model-free, and feature-fusion-based or decision-fusion-based methods. To evaluate the proposed schemes, recognition experiments were conducted using noisy audio-visual data. Then it is found that the proposed method using the model-free feature-fusion AVVAD method outperforms not only audio-only ASR but also conventional AVASR. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
multimodal / speech recognition / voice activity detection / feature fusion / decision fusion / / / |
Reference Info. |
IEICE Tech. Rep., vol. 109, no. 375, SP2009-145, pp. 345-350, Jan. 2010. |
Paper # |
SP2009-145 |
Date of Issue |
2010-01-14 (CQ, PRMU, SP, MVE) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
CQ2009-105 PRMU2009-204 SP2009-145 MVE2009-127 |
Conference Information |
Committee |
PRMU SP MVE CQ |
Conference Date |
2010-01-21 - 2010-01-22 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Kyoto Univ. |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
|
Paper Information |
Registration To |
SP |
Conference Code |
2010-01-PRMU-SP-MVE-CQ |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Multimodal speech recognition using multimodal voice activity detection |
Sub Title (in English) |
|
Keyword(1) |
multimodal |
Keyword(2) |
speech recognition |
Keyword(3) |
voice activity detection |
Keyword(4) |
feature fusion |
Keyword(5) |
decision fusion |
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Satoshi Tamura |
1st Author's Affiliation |
Gifu University (Gifu Univ.) |
2nd Author's Name |
Masato Ishikawa |
2nd Author's Affiliation |
Gifu University (Gifu Univ.) |
3rd Author's Name |
Takashi Hashiba |
3rd Author's Affiliation |
Gifu University (Gifu Univ.) |
4th Author's Name |
Shin'ichi Takeuchi |
4th Author's Affiliation |
Gifu University (Gifu Univ.) |
5th Author's Name |
Satoru Hayamizu |
5th Author's Affiliation |
Gifu University (Gifu Univ.) |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2010-01-22 14:50:00 |
Presentation Time |
30 minutes |
Registration for |
SP |
Paper # |
CQ2009-105, PRMU2009-204, SP2009-145, MVE2009-127 |
Volume (vol) |
vol.109 |
Number (no) |
no.373(CQ), no.374(PRMU), no.375(SP), no.376(MVE) |
Page |
pp.345-350 |
#Pages |
6 |
Date of Issue |
2010-01-14 (CQ, PRMU, SP, MVE) |
|