Presentation 2004/7/23
Skewed Checkpointing for Tolerating Multi-Node Failures in Cluster System
Yuya TAJIMA, Takuro HAYASHIDA, Masaaki KONDO, Masashi IMAI, Hiroshi NAKAMURA, Takashi NANYA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Large cluster systems become widely utilized because they achieve good performance/cost ratio especially in high performance computing. However, as the number of computing nodes gets larger, the possibility of failures increases. Thus, not only single failure but also multi failures should be tolerated in such systems. In this paper, we propose a new checkpointing scheme called "Skewed Checkpointing" for multi-node failures and show its performance evaluation.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Checkpointing / Cluster system / multi-node failures
Paper # DC2004-19
Date of Issue

Conference Information
Committee DC
Conference Date 2004/7/23(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Dependable Computing (DC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Skewed Checkpointing for Tolerating Multi-Node Failures in Cluster System
Sub Title (in English)
Keyword(1) Checkpointing
Keyword(2) Cluster system
Keyword(3) multi-node failures
1st Author's Name Yuya TAJIMA
1st Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo()
2nd Author's Name Takuro HAYASHIDA
2nd Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo:(Present address)NEC CORPORATION
3rd Author's Name Masaaki KONDO
3rd Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo
4th Author's Name Masashi IMAI
4th Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo
5th Author's Name Hiroshi NAKAMURA
5th Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo
6th Author's Name Takashi NANYA
6th Author's Affiliation Research Center for Advanced Science and Technology, The University Tokyo
Date 2004/7/23
Paper # DC2004-19
Volume (vol) vol.104
Number (no) 239
Page pp.pp.-
#Pages 6
Date of Issue