Summary

International Conference on Machine Vision Applications

2023

Session Number:P1

Session:

Number:P1-04

Bottleneck Transformer model with Channel Self-Attention for skin lesion classification

Tada Masato,  Han Xian-Hua,  

pp.-

Publication Date:2023/07/23

Online ISSN:2188-5079

DOI:10.34385/proc.78.P1-04

PDF download

PayPerView

Summary:
Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5- year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture longrange dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.