Summary

2021

Session Number:PS3

Session:

Number:PS3-2

On Detecting Cloud Container Failures from Computing Utility Sequences

Yu-Shao Liu,  Hsu-Chao Lai,  Jiun-Long Huang,  August F. Y. Chao,  

pp.358-361

Publication Date:2021/9/8

Online ISSN:2188-5079

DOI:10.34385/proc.67.PS3-2

PDF download (329.1KB)

Summary:
As the popularity of cloud platforms and container grows rapidly, managing clouds has become an important issue. For example, failed containers on cloud platforms would trigger automatic restart mechanism. However, the failed containers caused by user error are not fixable by restart, and may lead to the loop between failure and restart. Therefore, the looping failure will harm the overall performance of cloud. In this paper, we propose to identify possible container failures, where the utility behavior of containers (e.g., CPU usage, GPU usage, I/O throughput, etc) are factored in, in a machine learning approach. We propose a light-weight neural network EEGNet- SE to support fast inference in real-time. In addition, EEGNet- SE is able to distinguish dynamic relations between each utility for different tasks. We conduct a real cloud container dataset from Taiwan Cloud Computing (TWCC) platform. Experimental results manifest that EEGNet-SE boosts the performance and efficiency simultaneously, and outperforms the other state-of-the- art methods in terms of accuracy.