[Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture

Mingfei Yu; Ruitao Gao; Masahiro Fujita

Presentation	2021-03-03 [Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture Mingfei Yu, Ruitao Gao, Masahiro Fujita,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	There is an obvious trend to make use of hardware including many-core CPU, GPU and FPGA, to conduct computationally intensive tasks of deep learning implementations, whilea large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multiplication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that scheduling problems are time-consuming or even infeasible, especially when the size of the involved matrix increases. In this paper, the minimum schedulingproblem of 44 SpMV on ring-connected architecture is first studied, with two concepts named multi-Input Vector and multi- Output Vector introduced. Then, we have conducted classification of 44 sparse matrices, since parallel schedule for matrices that are able to transform into each other can be simply obtained through mutual transformation, rather than time-consuming search. On account of this theory, we have put forward a decomposition-based algorithm for larger matrices. With the proposed algorithm, search space of the minimum schedule is considerably reduced, as the solvement is guided by known sub-scheduling solutions. Through comparison with an exhaustivesearch method and a brute force-based parallel scheduling method, the proposed algorithm is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27%of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time cost (250 times less) and the best scalability among the above mentioned approaches.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	sparse matrix-vector multiplication / parallel computing / communication structure / convolutional neural network
Paper #	VLD2020-71,HWS2020-46
Date of Issue	2021-02-24 (VLD, HWS)

Conference Information
Committee	HWS / VLD
Conference Date	2021/3/3(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Design Technology for System-on-Silicon, Hardware Security, etc.
Chair	Makoto Ikeda(Univ. of Tokyo) / Daisuke Fukuda(Fujitsu Labs.)
Vice Chair	Yasuhisa Shimazaki(Renesas Electronics) / Makoto Nagata(Kobe Univ.) / Kazutoshi Kobayashi(Kyoto Inst. of Tech.)
Secretary	Yasuhisa Shimazaki(Kyushu Univ.) / Makoto Nagata(NTT) / Kazutoshi Kobayashi(Hitachi)
Assistant	/ Takuma Nishimoto(Hitachi)

Paper Information
Registration To	Technical Committee on Hardware Security / Technical Committee on VLSI Design Technologies
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	[Memorial Lecture] Scheduling Sparse Matrix-Vector Multiplication onto Parallel Communication Architecture
Sub Title (in English)
Keyword(1)	sparse matrix-vector multiplication
Keyword(2)	parallel computing
Keyword(3)	communication structure
Keyword(4)	convolutional neural network
1st Author's Name	Mingfei Yu
1st Author's Affiliation	The University of Tokyo(Univ. Tokyo)
2nd Author's Name	Ruitao Gao
2nd Author's Affiliation	The University of Tokyo(Univ. Tokyo)
3rd Author's Name	Masahiro Fujita
3rd Author's Affiliation	The University of Tokyo(Univ. Tokyo)
Date	2021-03-03
Paper #	VLD2020-71,HWS2020-46
Volume (vol)	vol.120
Number (no)	VLD-400,HWS-401
Page	pp.pp.24-29(VLD), pp.24-29(HWS),
#Pages	6
Date of Issue	2021-02-24 (VLD, HWS)