バリア同期付き非同期メモリマシンモデル(演算機構,組込み技術とネットワークに関するワークショップETNET2013)

中野 浩嗣

講演名	2013-03-14 バリア同期付き非同期メモリマシンモデル(演算機構,組込み技術とネットワークに関するワークショップETNET2013) 中野浩嗣,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	The Discrete Memory Machine (DMM)とthe Unified Memory Machine (UMM)は,GPUのシェアードメモリとグローバルメモリヘのアクセスの本質をとらえた並列計算モデルである.フープと呼ばれるスレッドの集まりごとに順にラウンドロビンに実行される.しかし,実際のGPUでは任意にフープが選ばれ実行される.本稿では,フープが任意に実行される非同期DMMと非同期UMMを提案する.そのかわり,synchthreads命令によりバリア同期が行なえるものと仮定する.バリア同期のコストは大きいので,バリア同期の回数を評価し,その回数を最小化するようにアルゴリズムを設計すべきである.本稿では,n個の合計を求める,バリア同期の回数の少ない並列アルゴリズムを示す.
抄録(英)	The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It was assumed that warps (i.e. groups of threads)on the DMM and the UMM work synchronously in the round-robin manner. However, warps work asynchronously in the actual GPUs, in the sense that warps may be randomly (or arbitrarily」dispatched for execution. The first contribution of this paper is to introduce an asynchronous version of the DMM and the UMM, in which warps are arbitrarily dispatched. Instead, we assume that threads can execute the "syncthreads" instruction for barrier synchronization. Since the barrier synchronization operation is costly, we should evaluate and minimize the number of barrier synchronization operations performed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to compute the sum of n numbers in optimal computing time and few barrier synchronization steps.
キーワード(和)	並列計算モデル / 並列アルゴリズム / 非同期モデル / GPU / CUDA
キーワード(英)	parallel computing models / parallel algorithms / asynchronous models / GPU / CUDA
資料番号	CPSY2012-93,DC2012-99
発行日

研究会情報
研究会	CPSY
開催期間	2013/3/6(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Computer Systems (CPSY)
本文の言語	ENG
タイトル（和）	バリア同期付き非同期メモリマシンモデル(演算機構,組込み技術とネットワークに関するワークショップETNET2013)
サブタイトル（和）
タイトル（英）	Asynchnorous Memory Machine Models with Barrier Synchronization
サブタイトル（和）
キーワード(1)（和/英）	並列計算モデル / parallel computing models
キーワード(2)（和/英）	並列アルゴリズム / parallel algorithms
キーワード(3)（和/英）	非同期モデル / asynchronous models
キーワード(4)（和/英）	GPU / GPU
キーワード(5)（和/英）	CUDA / CUDA
第 1 著者氏名（和/英）	中野浩嗣 / Koji NAKANO
第 1 著者所属（和/英）	広島大学大学院工学研究院 School of Engineering, Hiroshima University
発表年月日	2013-03-14
資料番号	CPSY2012-93,DC2012-99
巻番号（vol）	vol.112
号番号（no）	481
ページ範囲	pp.-
ページ数	6
発行日