講演名 2021-03-03
[Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture
Mingfei Yu(Univ. Tokyo), Ruitao Gao(Univ. Tokyo), Masahiro Fujita(Univ. Tokyo),
PDFダウンロードページ PDFダウンロードページへ
抄録(和) There is an obvious trend to make use of hardware including many-core CPU, GPU and FPGA, to conduct computationally intensive tasks of deep learning implementations, whilea large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multiplication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that scheduling problems are time-consuming or even infeasible, especially when the size of the involved matrix increases. In this paper, the minimum schedulingproblem of 4*4 SpMV on ring-connected architecture is first studied, with two concepts named multi-Input Vector and multi- Output Vector introduced. Then, we have conducted classification of 4*4 sparse matrices, since parallel schedule for matrices that are able to transform into each other can be simply obtained through mutual transformation, rather than time-consuming search. On account of this theory, we have put forward a decomposition-based algorithm for larger matrices. With the proposed algorithm, search space of the minimum schedule is considerably reduced, as the solvement is guided by known sub-scheduling solutions. Through comparison with an exhaustivesearch method and a brute force-based parallel scheduling method, the proposed algorithm is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27%of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time cost (250 times less) and the best scalability among the above mentioned approaches.
抄録(英) There is an obvious trend to make use of hardware including many-core CPU, GPU and FPGA, to conduct computationally intensive tasks of deep learning implementations, whilea large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multiplication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that scheduling problems are time-consuming or even infeasible, especially when the size of the involved matrix increases. In this paper, the minimum schedulingproblem of 4*4 SpMV on ring-connected architecture is first studied, with two concepts named multi-Input Vector and multi- Output Vector introduced. Then, we have conducted classification of 4*4 sparse matrices, since parallel schedule for matrices that are able to transform into each other can be simply obtained through mutual transformation, rather than time-consuming search. On account of this theory, we have put forward a decomposition-based algorithm for larger matrices. With the proposed algorithm, search space of the minimum schedule is considerably reduced, as the solvement is guided by known sub-scheduling solutions. Through comparison with an exhaustivesearch method and a brute force-based parallel scheduling method, the proposed algorithm is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27%of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time cost (250 times less) and the best scalability among the above mentioned approaches.
キーワード(和) sparse matrix-vector multiplication / parallel computing / communication structure / convolutional neural network
キーワード(英) sparse matrix-vector multiplication / parallel computing / communication structure / convolutional neural network
資料番号 VLD2020-71,HWS2020-46
発行日 2021-02-24 (VLD, HWS)

研究会情報
研究会 HWS / VLD
開催期間 2021/3/3(から2日開催)
開催地(和) オンライン開催
開催地(英) Online
テーマ(和) システムオンシリコンを支える設計技術, ハードウェアセキュリティ, 一般
テーマ(英) Design Technology for System-on-Silicon, Hardware Security, etc.
委員長氏名(和) 池田 誠(東大) / 福田 大輔(富士通研)
委員長氏名(英) Makoto Ikeda(Univ. of Tokyo) / Daisuke Fukuda(Fujitsu Labs.)
副委員長氏名(和) 島崎 靖久(ルネサスエレクトロニクス) / 永田 真(神戸大) / 小林 和淑(京都工繊大)
副委員長氏名(英) Yasuhisa Shimazaki(Renesas Electronics) / Makoto Nagata(Kobe Univ.) / Kazutoshi Kobayashi(Kyoto Inst. of Tech.)
幹事氏名(和) 小野 貴継(九大) / 高橋 順子(NTT) / 桜井 祐市(日立) / 兼本 大輔(大阪大学)
幹事氏名(英) Takatsugu Ono(Kyushu Univ.) / Junko Takahashi(NTT) / Yuichi Sakurai(Hitachi) / Daisuke Kanemoto(Osaka Univ.)
幹事補佐氏名(和) / 西元 琢真(日立)
幹事補佐氏名(英) / Takuma Nishimoto(Hitachi)

講演論文情報詳細
申込み研究会 Technical Committee on Hardware Security / Technical Committee on VLSI Design Technologies
本文の言語 ENG
タイトル(和)
サブタイトル(和)
タイトル(英) [Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture
サブタイトル(和)
キーワード(1)(和/英) sparse matrix-vector multiplication / sparse matrix-vector multiplication
キーワード(2)(和/英) parallel computing / parallel computing
キーワード(3)(和/英) communication structure / communication structure
キーワード(4)(和/英) convolutional neural network / convolutional neural network
第 1 著者 氏名(和/英) Mingfei Yu / Mingfei Yu
第 1 著者 所属(和/英) The University of Tokyo(略称:Univ. Tokyo)
The University of Tokyo(略称:Univ. Tokyo)
第 2 著者 氏名(和/英) Ruitao Gao / Ruitao Gao
第 2 著者 所属(和/英) The University of Tokyo(略称:Univ. Tokyo)
The University of Tokyo(略称:Univ. Tokyo)
第 3 著者 氏名(和/英) Masahiro Fujita / Masahiro Fujita
第 3 著者 所属(和/英) The University of Tokyo(略称:Univ. Tokyo)
The University of Tokyo(略称:Univ. Tokyo)
発表年月日 2021-03-03
資料番号 VLD2020-71,HWS2020-46
巻番号(vol) vol.120
号番号(no) VLD-400,HWS-401
ページ範囲 pp.24-29(VLD), pp.24-29(HWS),
ページ数 6
発行日 2021-02-24 (VLD, HWS)