Dialogue disfluency detection using context
Hiroto Nakashima, Kazutaka Shimada,
Abstract(in English) Recent automatic speech recognition (ASR) techniques have been improved by a large amount of training data and machine learning, such as deep learning technology. Problems in the outputs from ASR are not only recognition errors but also outputs caused by disfluency from speakers. It is difficult to remove them automatically, and removing them by hand is costly. In this paper, we propose a disfluency detection model with BERT. The model utilizes context information of target utterances. We introduce two types of context information. The first one is real utterances that appear around the target utterance. We compare several sequence lengths of the previous and following utterances. The second one is a generated utterance by GPT-2. Our model adds the utterance generated from the target utterance as the following context. In the experiment, the long sequence improves the disfluency detection accuracy, and real context outperforms generated context.
Keyword(in English) disfluency detection / disfluency / dialogue generation / context complement
