The Best Paper Award
Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
Yasuhisa FUJII , Kazumasa YAMAMOTO ,
Seiichi NAKAGAWA (Toyohashi University of Technology)
i‰p•Ά˜_•ΆŽD@•½¬24”N8ŒŽ†ŒfΪj
Yasuhisa FUJII Kazumasa YAMAMOTO Seiichi NAKAGAWA      
@Accurate acoustic model construction is one of the important issues in improving the performance of Automatic Speech Recognition (ASR). Current ASR systems generally employ a Hidden Markov Model (HMM) as an acoustic model. Recently, speech recognition methods using discriminative models, instead of HMMs, have attracted much attention. Hidden Conditional Random Fields (HCRF) is one of the representative discriminative models. Although HCRF is promising, it has a drawback in that it cannot accommodate non-linearity features because it computes the score of a recognition hypothesis by summing up linearly weighted feature values.
@To address this, this paper first proposes gHidden Conditional Neural Fields (HCNF)h which can represent the non-linearity between features by introducing the gate function of a Multi-Layer Perceptron (MLP) into the HCRF. This paper further proposes three methods to improve HCRF-based speech recognition: 1) a new training criterion, Hidden Boosted MMI (HB-MMI), 2) a hierarchical model topology using two-layer HCNF, and 3) inclusion of a Deep Neural Network (DNN).
@The proposed methods are evaluated for both English and Japanese phoneme recognition tasks. In conclusion, the phoneme recognition performance of the proposed method with HB-MMI and DNN outperforms the recognition performance of a conventional acoustic model, namely, triphone HMM.@The proposed model is expected to become a useful and effective acoustic model in place of the conventional HMM. The contribution of this study is highly evaluated, since it actively incorporates various ideas for a steady improvement of the HCRF-based recognition performance and carefully evaluates the effectiveness of each idea. Furthermore, the proposed modeling is expected to be applicable for other tasks in addition to speech recognition.
@For the above reasons the contribution of the paper is extremely highly evaluated as a proposal of a new technique for discriminative modeling.

Close