ViTVO: Vision Transformer based Visual Odometry with Attention Supervision

Chiu Chu-Chi; Yang Hsuan-Kung; Chen Hao-Wei; Chen Yu-Wen; Lee Chun-Yi

Summary

International Conference on Machine Vision Applications

2023

Session Number:P1

Session:

Number:P1-20

ViTVO: Vision Transformer based Visual Odometry with Attention Supervision

Chiu Chu-Chi, Yang Hsuan-Kung, Chen Hao-Wei, Chen Yu-Wen, Lee Chun-Yi,

pp.-

Publication Date:2023/07/23

Online ISSN:2188-5079

DOI:10.34385/proc.78.P1-20

PDF download

Summary:

In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.