Presentation 2008/7/29
ソフトウェアECCによるGPUメモリの耐故障性の実現と評価(信頼性とセキュリティ,SWoPP佐賀2008-2008年並列/分散/協調処理に関する『佐賀』サマー・ワークショップ)
Naoya MARUYAMA, Satoshi MATSUOKA, Yasuhiko OGATA, Akira NUKADA, Toshio ENDO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) General-Purpose Processing on GPUs (GPGPUs) has rapidly been recoginized as a promissing HPC technology because of GPUs' much higher peak floating-point processing power. However, GPUs have originally been developed for graphics applications, such as 3D games, where reliability is not considered as an important issue as in HPC communities. One notable example is the lack of ECC in graphics memory systems. To improve the reliability of GPUs for HPC applications, we propose a software-based technique to generate and check ECC for graphics memory. Our library-based approache allows for CUDA-based GPGPU applications to be easily extended with ECC-based error checking with little manual intervention. To evaluate the applicability of our approach, we extended two CUDA applications with our ECC libarary: a matrix multiplication and an N-body problem. Our performance studies showed that while matrix multiplication can take up to 300% overhead, the N-body application only incurrs 15% of overhead. These results suggest that software-based ECC would be a promissing approach for computation-intensive applications such as N-body problems.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) GPGPU / dependability / ECC
Paper # DC2008-20
Date of Issue

Conference Information
Committee DC
Conference Date 2008/7/29(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Dependable Computing (DC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English)
Sub Title (in English)
Keyword(1) GPGPU
Keyword(2) dependability
Keyword(3) ECC
1st Author's Name Naoya MARUYAMA
1st Author's Affiliation Tokyo Institute of Technology:Japan Science and Technology Agency, CREST()
2nd Author's Name Satoshi MATSUOKA
2nd Author's Affiliation Tokyo Institute of Technology:National Institute of Informatics:Japan Science and Technology Agency, CREST
3rd Author's Name Yasuhiko OGATA
3rd Author's Affiliation Tokyo Institute of Technology:Japan Science and Technology Agency, CREST
4th Author's Name Akira NUKADA
4th Author's Affiliation Tokyo Institute of Technology:Japan Science and Technology Agency, CREST
5th Author's Name Toshio ENDO
5th Author's Affiliation Tokyo Institute of Technology:Japan Science and Technology Agency, CREST
Date 2008/7/29
Paper # DC2008-20
Volume (vol) vol.108
Number (no) 181
Page pp.pp.-
#Pages 7
Date of Issue