Presentation | 2005/7/29 Initial Discussion about Adaptive Checkpointing for HPC cluster in view of fluctuate of Failure-Rate Miwako AZUMA, Masaaki KONDO, Masahi IMAI, Hiroshi NAKAMURA, Takashi NANYA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Cluster systems have been widely utilized especially in high performance computing because of its good cost performance. Fault tolerant techniques are needed for such cluster systems. Conventional checkpointing techniques assume common and constant failure-rate among nodes and during execution time. However, in practice, failure-rate is different among nodes and fluctuates during computation. We proposed a checkpointing method which is adaptive for such variations by focusing on the interval of checkpointing. In this paper, performance evaluation of our new checkpointing is presented. The result shows that our method reduces the overhead for both space and time variations of failure-rate. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Checkpointing / Overhead / Cluster system / variation of failure-rate |
Paper # | DC2005-14 |
Date of Issue |
Conference Information | |
Committee | DC |
---|---|
Conference Date | 2005/7/29(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Dependable Computing (DC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Initial Discussion about Adaptive Checkpointing for HPC cluster in view of fluctuate of Failure-Rate |
Sub Title (in English) | |
Keyword(1) | Checkpointing |
Keyword(2) | Overhead |
Keyword(3) | Cluster system |
Keyword(4) | variation of failure-rate |
1st Author's Name | Miwako AZUMA |
1st Author's Affiliation | Research Center for Advanced Science and Technology, The University of Tokyo() |
2nd Author's Name | Masaaki KONDO |
2nd Author's Affiliation | Research Center for Advanced Science and Technology, The University of Tokyo |
3rd Author's Name | Masahi IMAI |
3rd Author's Affiliation | Research Center for Advanced Science and Technology, The University of Tokyo |
4th Author's Name | Hiroshi NAKAMURA |
4th Author's Affiliation | Research Center for Advanced Science and Technology, The University of Tokyo |
5th Author's Name | Takashi NANYA |
5th Author's Affiliation | Research Center for Advanced Science and Technology, The University of Tokyo |
Date | 2005/7/29 |
Paper # | DC2005-14 |
Volume (vol) | vol.105 |
Number (no) | 227 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |