Presentation 2003/7/30
Consideration about Fast Checkpointing Mechanism for High-reliable HPC Cluster System
Takuro HAYASHIDA, Masaaki KONDO, Masashi IMAI, Hiroshi NAKAMURA, Takashi NANYA, Atsushi HORI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Cluster systems are getting widely used because of good performance / cost ratio. However, little attention has been paid for their reliability so far. As the number of commodity components in a cluster system gets increased, it is indispensable to support reliability by system software. We propose a hierarchical checkpointing in this paper. We explain its mechanism and show preliminary experimantal results. In the experiment, the proposed mechanism is prototyped by modifying SCore cluster system, which is a parallel programming environment with checkpoint mechanism and is open to public.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) checkpointing / SCore Cluster System Software / Dependable system
Paper # DC2003-11
Date of Issue

Conference Information
Committee DC
Conference Date 2003/7/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Dependable Computing (DC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Consideration about Fast Checkpointing Mechanism for High-reliable HPC Cluster System
Sub Title (in English)
Keyword(1) checkpointing
Keyword(2) SCore Cluster System Software
Keyword(3) Dependable system
1st Author's Name Takuro HAYASHIDA
1st Author's Affiliation Research Center for Advanced Science and Technology. The University of Tokyo()
2nd Author's Name Masaaki KONDO
2nd Author's Affiliation JST:Research Center for Advanced Science and Technology. The University of Tokyo
3rd Author's Name Masashi IMAI
3rd Author's Affiliation Research Center for Advanced Science and Technology. The University of Tokyo
4th Author's Name Hiroshi NAKAMURA
4th Author's Affiliation Research Center for Advanced Science and Technology. The University of Tokyo
5th Author's Name Takashi NANYA
5th Author's Affiliation Research Center for Advanced Science and Technology. The University of Tokyo
6th Author's Name Atsushi HORI
6th Author's Affiliation Swimmy Software, Inc.
Date 2003/7/30
Paper # DC2003-11
Volume (vol) vol.103
Number (no) 250
Page pp.pp.-
#Pages 6
Date of Issue