Presentation 2020-03-17
Acceleration of Deep Learning Inference by Model Cascading
Shohei Enomoto, Takeharu Eda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In recent years, various applications have appeared due to the development of deep learning and the spread of IoT devices. It is desirable for these applications to complete its processing on IoT devices, but it is difficult to create DNN models that have high-accuracy and satisfy the resource constraints of IoT devices. Therefore, an approach, called model cascading is studied that can realize high-accuracy and high-speed inference by deploying a lightweight model on an IoT device and a high-accuracy model on the cloud and offloading the inference processing to the high-accuracy model only when the prediction of the lightweight model is not credible. In the model cascading, confidence score which estimates how much the lightweight model is confident about its prediction result is important. Several previous studies obtained the confidence score from their prediction probability values, but it is known that the probability value is not accurate for estimating confidence score. In this paper, we propose a method for optimizing the loss function for the normal task and the model cascading at the same time when learning a lightweight model, and obtaining confidence using prediction probability suitable for model cascading. The proposed method achieved a reduction in computational cost of up to $ 36 % $ and a reduction in communication cost of $ 41 % $ compared to the case of inference using only ResNet152.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Deep Learning / DNN / Inference / Model Cascading / IoT / Edge Device
Paper # PRMU2019-98
Date of Issue 2020-03-09 (PRMU)

Conference Information
Committee PRMU / IPSJ-CVIM
Conference Date 2020/3/16(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Yoichi Sato(Univ. of Tokyo)
Vice Chair Toru Tamaki(Hiroshima Univ.) / Akisato Kimura(NTT)
Secretary Toru Tamaki(NTT) / Akisato Kimura(OMRON SINICX)
Assistant Yusuke Uchida(DeNA) / Takayoshi Yamashita(Chubu Univ.)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Acceleration of Deep Learning Inference by Model Cascading
Sub Title (in English)
Keyword(1) Deep Learning
Keyword(2) DNN
Keyword(3) Inference
Keyword(4) Model Cascading
Keyword(5) IoT
Keyword(6) Edge Device
1st Author's Name Shohei Enomoto
1st Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
2nd Author's Name Takeharu Eda
2nd Author's Affiliation NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT)
Date 2020-03-17
Paper # PRMU2019-98
Volume (vol) vol.119
Number (no) PRMU-481
Page pp.pp.203-208(PRMU),
#Pages 6
Date of Issue 2020-03-09 (PRMU)