Presentation 2005/7/29
Initial Discussion about Adaptive Checkpointing for HPC cluster in view of fluctuate of Failure-Rate
Miwako AZUMA, Masaaki KONDO, Masahi IMAI, Hiroshi NAKAMURA, Takashi NANYA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Cluster systems have been widely utilized especially in high performance computing because of its good cost performance. Fault tolerant techniques are needed for such cluster systems. Conventional checkpointing techniques assume common and constant failure-rate among nodes and during execution time. However, in practice, failure-rate is different among nodes and fluctuates during computation. We proposed a checkpointing method which is adaptive for such variations by focusing on the interval of checkpointing. In this paper, performance evaluation of our new checkpointing is presented. The result shows that our method reduces the overhead for both space and time variations of failure-rate.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Checkpointing / Overhead / Cluster system / variation of failure-rate
Paper # DC2005-14
Date of Issue

Conference Information
Committee DC
Conference Date 2005/7/29(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Dependable Computing (DC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Initial Discussion about Adaptive Checkpointing for HPC cluster in view of fluctuate of Failure-Rate
Sub Title (in English)
Keyword(1) Checkpointing
Keyword(2) Overhead
Keyword(3) Cluster system
Keyword(4) variation of failure-rate
1st Author's Name Miwako AZUMA
1st Author's Affiliation Research Center for Advanced Science and Technology, The University of Tokyo()
2nd Author's Name Masaaki KONDO
2nd Author's Affiliation Research Center for Advanced Science and Technology, The University of Tokyo
3rd Author's Name Masahi IMAI
3rd Author's Affiliation Research Center for Advanced Science and Technology, The University of Tokyo
4th Author's Name Hiroshi NAKAMURA
4th Author's Affiliation Research Center for Advanced Science and Technology, The University of Tokyo
5th Author's Name Takashi NANYA
5th Author's Affiliation Research Center for Advanced Science and Technology, The University of Tokyo
Date 2005/7/29
Paper # DC2005-14
Volume (vol) vol.105
Number (no) 227
Page pp.pp.-
#Pages 6
Date of Issue