Paper Abstract and Keywords |
Presentation |
2012-07-21 12:00
WFST-based Structured Classification of Features Extracted by Using Deep Neural Networks Yotaro Kubo, Takaaki Hori, Atsushi Nakamura (NTT) SP2012-57 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Multilayer perceptrons, which include more than 2 hidden layers, are known to be efficient for modeling of complex classification processes. However, due to the local optima and plateaus in their training objective functions, these perceptrons had not been used in practice.
Recently, a heuristic method that involves the use of initial value obtained by applying unsupervised training of neural networks have enabled the practical use of such perceptrons.
By introducing multiple hidden layers, the total number of needed units to accurately model the nonlinear classification processes would become smaller than that in single hidden layer networks.
Consequently, we can analyze that the main contribution of introducing deep processings is enhancement in feature representations.
On the other hand, an approach called structured classification have been collecting attention of speech researchers since it realizes direct modeling of sequence-to-sequence classification.
However, it is known that the feature transformation is important in this approach since it typically considers the sequence classification as linear classification processes.
In this paper, we attempt to combine these two approaches in order to enhance the both sides; feature representations and label representations.
Specifically, we introduced the structured classification method based on weighted finite-state transducers into the multilayer perceptron-based speech recognition systems. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Automatic Speech Recognition / Structured Classification / Deep Learning / Temporal Features / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 112, no. 141, SP2012-57, pp. 39-44, July 2012. |
Paper # |
SP2012-57 |
Date of Issue |
2012-07-12 (SP) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
SP2012-57 |
Conference Information |
Committee |
SP IPSJ-SLP |
Conference Date |
2012-07-19 - 2012-07-21 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Hotel Takinoyu (Yamagata Pref.) |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
Speech recognition, understanding, dialog, etc. |
Paper Information |
Registration To |
SP |
Conference Code |
2012-07-SP-SLP |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
WFST-based Structured Classification of Features Extracted by Using Deep Neural Networks |
Sub Title (in English) |
|
Keyword(1) |
Automatic Speech Recognition |
Keyword(2) |
Structured Classification |
Keyword(3) |
Deep Learning |
Keyword(4) |
Temporal Features |
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Yotaro Kubo |
1st Author's Affiliation |
Nippon Telegraph and Telephone Corporation (NTT) |
2nd Author's Name |
Takaaki Hori |
2nd Author's Affiliation |
Nippon Telegraph and Telephone Corporation (NTT) |
3rd Author's Name |
Atsushi Nakamura |
3rd Author's Affiliation |
Nippon Telegraph and Telephone Corporation (NTT) |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2012-07-21 12:00:00 |
Presentation Time |
30 minutes |
Registration for |
SP |
Paper # |
SP2012-57 |
Volume (vol) |
vol.112 |
Number (no) |
no.141 |
Page |
pp.39-44 |
#Pages |
6 |
Date of Issue |
2012-07-12 (SP) |
|