Presentation | 2022-05-20 Proposal for a Method of Estimating IT System Failure Locations Using Alert Scoring Reiko Kondo, Kazutaka Ogihara, Takashi Shiraishi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In a system where services (microservices) consisting of a combination of many applications are deployed on multiple virtual and physical machines, each device works together, so when a failure occurs, alerts are generated from a wide range of devices, making it difficult to identify the location of the failure. In addition, the alert thresholds are set to identify the failure location, but if the settings are inappropriate, the equipment cannot be identified as the failure location. Also, because of the large number of devices, multiple different failures can occur in a short period of time, and it is possible that multiple alerts generated from different failure locations may be investigated as the same failure. Therefore, we propose a failure location estimation method that evaluates possibility of a failure location from the alerts of each device. The failure location estimation method has the following three features. By integrating and analyzing the dependencies between applications and infrastructure, the failure location can be estimated from alerts for a wide range of devices from applications to infrastructures. In addition, by reflecting the propagation of alerts between devices in the score, even devices that do not raise alerts due to a threshold setting error, etc., can be estimated as fault locations. Furthermore, by grouping related alerts based on configuration information and dependencies between devices, it is expected to be applied to the classification of multiple failures. By using the proposed technology, operators can investigate the equipment with the highest score first, which is expected to shorten the time required for fault recovery. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Container / Operation / System configuration / Scoring / Multiple failures / Docker / Kubernetes / Istio |
Paper # | ICM2022-8 |
Date of Issue | 2022-05-12 (ICM) |
Conference Information | |
Committee | ICM / IPSJ-CSEC / IPSJ-IOT |
---|---|
Conference Date | 2022/5/19(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Kazuhiko Kinoshita(Tokushima Univ.) |
Vice Chair | Haruo Ooishi(NTT) / Eiji Takahashi(NEC) |
Secretary | Haruo Ooishi(Bosco) / Eiji Takahashi(Fujitsu) |
Assistant | Yoshifumi Kato(NTT) |
Paper Information | |
Registration To | Technical Committee on Information and Communication Management / Special Interest Group on Computer Security / Special Interest Group on Internet and Operation Technology |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Proposal for a Method of Estimating IT System Failure Locations Using Alert Scoring |
Sub Title (in English) | |
Keyword(1) | Container |
Keyword(2) | Operation |
Keyword(3) | System configuration |
Keyword(4) | Scoring |
Keyword(5) | Multiple failures |
Keyword(6) | Docker |
Keyword(7) | Kubernetes |
Keyword(8) | Istio |
1st Author's Name | Reiko Kondo |
1st Author's Affiliation | FUJITSU LIMITED(FUJITSU) |
2nd Author's Name | Kazutaka Ogihara |
2nd Author's Affiliation | FUJITSU LIMITED(FUJITSU) |
3rd Author's Name | Takashi Shiraishi |
3rd Author's Affiliation | FUJITSU LIMITED(FUJITSU) |
Date | 2022-05-20 |
Paper # | ICM2022-8 |
Volume (vol) | vol.122 |
Number (no) | ICM-32 |
Page | pp.pp.36-41(ICM), |
#Pages | 6 |
Date of Issue | 2022-05-12 (ICM) |