モデルカスケードによる深層学習推論の高速化

榎本 昇平; 江田 毅晴

Presentation	2020-03-17 Acceleration of Deep Learning Inference by Model Cascading Shohei Enomoto, Takeharu Eda,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In recent years, various applications have appeared due to the development of deep learning and the spread of IoT devices. It is desirable for these applications to complete its processing on IoT devices, but it is difficult to create DNN models that have high-accuracy and satisfy the resource constraints of IoT devices. Therefore, an approach, called model cascading is studied that can realize high-accuracy and high-speed inference by deploying a lightweight model on an IoT device and a high-accuracy model on the cloud and offloading the inference processing to the high-accuracy model only when the prediction of the lightweight model is not credible. In the model cascading, confidence score which estimates how much the lightweight model is confident about its prediction result is important. Several previous studies obtained the confidence score from their prediction probability values, but it is known that the probability value is not accurate for estimating confidence score. In this paper, we propose a method for optimizing the loss function for the normal task and the model cascading at the same time when learning a lightweight model, and obtaining confidence using prediction probability suitable for model cascading. The proposed method achieved a reduction in computational cost of up to $ 36 % $ and a reduction in communication cost of $ 41 % $ compared to the case of inference using only ResNet152.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Deep Learning / DNN / Inference / Model Cascading / IoT / Edge Device
Paper #	PRMU2019-98
Date of Issue	2020-03-09 (PRMU)

Conference Information
Committee	PRMU / IPSJ-CVIM
Conference Date	2020/3/16(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Yoichi Sato(Univ. of Tokyo)
Vice Chair	Toru Tamaki(Hiroshima Univ.) / Akisato Kimura(NTT)
Secretary	Toru Tamaki(NTT) / Akisato Kimura(OMRON SINICX)
Assistant	Yusuke Uchida(DeNA) / Takayoshi Yamashita(Chubu Univ.)

Paper Information
Registration To	Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Acceleration of Deep Learning Inference by Model Cascading
Sub Title (in English)
Keyword(1)	Deep Learning
Keyword(2)	DNN
Keyword(3)	Inference
Keyword(4)	Model Cascading
Keyword(5)	IoT
Keyword(6)	Edge Device
1st Author's Name	Shohei Enomoto
1st Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
2nd Author's Name	Takeharu Eda
2nd Author's Affiliation	NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
Date	2020-03-17
Paper #	PRMU2019-98
Volume (vol)	vol.119
Number (no)	PRMU-481
Page	pp.pp.203-208(PRMU),
#Pages	6
Date of Issue	2020-03-09 (PRMU)