講演名 | 2021-03-03 [Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture Mingfei Yu(Univ. Tokyo), Ruitao Gao(Univ. Tokyo), Masahiro Fujita(Univ. Tokyo), |
---|---|
PDFダウンロードページ | PDFダウンロードページへ |
抄録(和) | There is an obvious trend to make use of hardware including many-core CPU, GPU and FPGA, to conduct computationally intensive tasks of deep learning implementations, whilea large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multiplication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that scheduling problems are time-consuming or even infeasible, especially when the size of the involved matrix increases. In this paper, the minimum schedulingproblem of 4*4 SpMV on ring-connected architecture is first studied, with two concepts named multi-Input Vector and multi- Output Vector introduced. Then, we have conducted classification of 4*4 sparse matrices, since parallel schedule for matrices that are able to transform into each other can be simply obtained through mutual transformation, rather than time-consuming search. On account of this theory, we have put forward a decomposition-based algorithm for larger matrices. With the proposed algorithm, search space of the minimum schedule is considerably reduced, as the solvement is guided by known sub-scheduling solutions. Through comparison with an exhaustivesearch method and a brute force-based parallel scheduling method, the proposed algorithm is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27%of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time cost (250 times less) and the best scalability among the above mentioned approaches. |
抄録(英) | There is an obvious trend to make use of hardware including many-core CPU, GPU and FPGA, to conduct computationally intensive tasks of deep learning implementations, whilea large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multiplication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that scheduling problems are time-consuming or even infeasible, especially when the size of the involved matrix increases. In this paper, the minimum schedulingproblem of 4*4 SpMV on ring-connected architecture is first studied, with two concepts named multi-Input Vector and multi- Output Vector introduced. Then, we have conducted classification of 4*4 sparse matrices, since parallel schedule for matrices that are able to transform into each other can be simply obtained through mutual transformation, rather than time-consuming search. On account of this theory, we have put forward a decomposition-based algorithm for larger matrices. With the proposed algorithm, search space of the minimum schedule is considerably reduced, as the solvement is guided by known sub-scheduling solutions. Through comparison with an exhaustivesearch method and a brute force-based parallel scheduling method, the proposed algorithm is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27%of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time cost (250 times less) and the best scalability among the above mentioned approaches. |
キーワード(和) | sparse matrix-vector multiplication / parallel computing / communication structure / convolutional neural network |
キーワード(英) | sparse matrix-vector multiplication / parallel computing / communication structure / convolutional neural network |
資料番号 | VLD2020-71,HWS2020-46 |
発行日 | 2021-02-24 (VLD, HWS) |
研究会情報 | |
研究会 | HWS / VLD |
---|---|
開催期間 | 2021/3/3(から2日開催) |
開催地(和) | オンライン開催 |
開催地(英) | Online |
テーマ(和) | システムオンシリコンを支える設計技術, ハードウェアセキュリティ, 一般 |
テーマ(英) | Design Technology for System-on-Silicon, Hardware Security, etc. |
委員長氏名(和) | 池田 誠(東大) / 福田 大輔(富士通研) |
委員長氏名(英) | Makoto Ikeda(Univ. of Tokyo) / Daisuke Fukuda(Fujitsu Labs.) |
副委員長氏名(和) | 島崎 靖久(ルネサスエレクトロニクス) / 永田 真(神戸大) / 小林 和淑(京都工繊大) |
副委員長氏名(英) | Yasuhisa Shimazaki(Renesas Electronics) / Makoto Nagata(Kobe Univ.) / Kazutoshi Kobayashi(Kyoto Inst. of Tech.) |
幹事氏名(和) | 小野 貴継(九大) / 高橋 順子(NTT) / 桜井 祐市(日立) / 兼本 大輔(大阪大学) |
幹事氏名(英) | Takatsugu Ono(Kyushu Univ.) / Junko Takahashi(NTT) / Yuichi Sakurai(Hitachi) / Daisuke Kanemoto(Osaka Univ.) |
幹事補佐氏名(和) | / 西元 琢真(日立) |
幹事補佐氏名(英) | / Takuma Nishimoto(Hitachi) |
講演論文情報詳細 | |
申込み研究会 | Technical Committee on Hardware Security / Technical Committee on VLSI Design Technologies |
---|---|
本文の言語 | ENG |
タイトル(和) | |
サブタイトル(和) | |
タイトル(英) | [Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture |
サブタイトル(和) | |
キーワード(1)(和/英) | sparse matrix-vector multiplication / sparse matrix-vector multiplication |
キーワード(2)(和/英) | parallel computing / parallel computing |
キーワード(3)(和/英) | communication structure / communication structure |
キーワード(4)(和/英) | convolutional neural network / convolutional neural network |
第 1 著者 氏名(和/英) | Mingfei Yu / Mingfei Yu |
第 1 著者 所属(和/英) | The University of Tokyo(略称:Univ. Tokyo) The University of Tokyo(略称:Univ. Tokyo) |
第 2 著者 氏名(和/英) | Ruitao Gao / Ruitao Gao |
第 2 著者 所属(和/英) | The University of Tokyo(略称:Univ. Tokyo) The University of Tokyo(略称:Univ. Tokyo) |
第 3 著者 氏名(和/英) | Masahiro Fujita / Masahiro Fujita |
第 3 著者 所属(和/英) | The University of Tokyo(略称:Univ. Tokyo) The University of Tokyo(略称:Univ. Tokyo) |
発表年月日 | 2021-03-03 |
資料番号 | VLD2020-71,HWS2020-46 |
巻番号(vol) | vol.120 |
号番号(no) | VLD-400,HWS-401 |
ページ範囲 | pp.24-29(VLD), pp.24-29(HWS), |
ページ数 | 6 |
発行日 | 2021-02-24 (VLD, HWS) |