CUDAによるAES実装のための計算粒度最適化手法(高速化技術,FPGA応用及び一般)

西川 尚紀; 岩井 啓輔; 黒川 恭一

講演名	2010-01-27 CUDAによるAES実装のための計算粒度最適化手法(高速化技術,FPGA応用及び一般) 西川尚紀, 岩井啓輔, 黒川恭一,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	並列計算プラットフォームとしてGPGPUが注目されており,CUDAがその開発環境として大きなシェアを占めるに至っている.CUDAではスレッド数やスレッドブロック数等のパラメータ決定はプログラマに責任があり,反復実験により最適値を求めているのが現状である.このため,我々は暗号処理のCUDA実装に対してスレッド数等を自動で最適化するモデルの構築を試みている.本稿では,その第一段階として,AESのCUDA実装に対し平文のデータ型,メモリ配置方法,計算粒度を変化させ,これらの条件がパフォーマンスに与える影響について分析を行った結果を示す.その結果,条件の違いにより最大6.6倍の性能差が生じ,(1)上限に近いスレッド数の確保よりもメモリアクセスの最適化を優先する実装が有効,(2)16Byte/Threadの計算粒度は4Byte/Thread,1Byte/Threadに対しGPUのパフォーマンスを引き出しやすい傾向にある,(3)平文のデータ型の違い,平文のメモリ配置方法,計算粒度がパフォーマンスに影響を与える,という知見が得られた.また,unsigned character及びarray of structureとして共有メモリに格納された平文に対して4Byte/Threadの計算粒度でのAES暗号化を行った場合にGPUの最大性能を引き出し,このときCore i7-920 2.66GHz CPU上での通常実装に対して約47倍の高速化が確認された.
抄録(英)	GPGPU as parallel computation platform has been noticed from almost all research fields. In particular CUDA occupies a high share of the GPGPU development environment. With CUDA, programmers are responsible for deciding the number of threads or thread blocks, but the optimum value is actually obtained by programmers' repetitive experiment. As the result, we have attempted to construct an automatic optimization model based on the number of threads used. As a first step, this paper presents analysis of how combinations such as data type of plaintext, memory allocation style of plaintext, and granularity, affect GPU performance. These experimental results show that there is up to a 6.6-fold performance increase among implementation methods with such combinations, resulting in the following insights: (1)Securing the number of threads, before the implementation of memory access optimization, is necessary. (2)16Byte/Thread granularity leads to higher GPU performance than 4Byte/Thread and 1Byte/Thread granularity. (3)Different data types in plaintext, memory allocation styles of plaintext, and granularity affect GPU performance. In addition, we confirmed AES encryption method with 4Byte/Thread granularity for plaintexts, stored in shared memory as both unsigned integer and structure of array leads to the GPU's maximum performance and this implementation method achieved as approximately 47-fold speed up as normal AES implementation on Core i7-920 2.66GHz CPU.
キーワード(和)	GPGPU / CUDA / 暗号 / AES / 性能予測
キーワード(英)	GPGPU / CUDA / Cipher / AES / Performance predication
資料番号	VLD2009-86,CPSY2009-68,RECONF2009-71
発行日

研究会情報
研究会	RECONF
開催期間	2010/1/19(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Reconfigurable Systems (RECONF)
本文の言語	JPN
タイトル（和）	CUDAによるAES実装のための計算粒度最適化手法(高速化技術,FPGA応用及び一般)
サブタイトル（和）
タイトル（英）	Granularity Optimization Method for AES Encryption Implementation on CUDA
サブタイトル（和）
キーワード(1)（和/英）	GPGPU / GPGPU
キーワード(2)（和/英）	CUDA / CUDA
キーワード(3)（和/英）	暗号 / Cipher
キーワード(4)（和/英）	AES / AES
キーワード(5)（和/英）	性能予測 / Performance predication
第 1 著者氏名（和/英）	西川尚紀 / Naoki NISHIKAWA
第 1 著者所属（和/英）	防衛大学校情報工学科 Department of Computer Science, National Defense Academy
第 2 著者氏名（和/英）	岩井啓輔 / Keisuke IWAI
第 2 著者所属（和/英）	防衛大学校情報工学科 Department of Computer Science, National Defense Academy
第 3 著者氏名（和/英）	黒川恭一 / Takakazu KUROKAWA
第 3 著者所属（和/英）	防衛大学校情報工学科 Department of Computer Science, National Defense Academy
発表年月日	2010-01-27
資料番号	VLD2009-86,CPSY2009-68,RECONF2009-71
巻番号（vol）	vol.109
号番号（no）	395
ページ範囲	pp.-
ページ数	6
発行日