Vision-Languageモデルを用いた物体検出におけるプロンプト学習手法の検討

Presentation	2023-05-19 Prompt Learning for Object Detection with Vision-Language Model Mariko Tomariguchi,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	The two-stage object detection models crop features in the regions where objects are most likely to be to classify the objects. In this work, we investigate the influence of the surrounding information on the objects on classifying objects and improve the prompt learning method for object detection using Vision-Language models. We learn the learnable vectors correspond to input prompts to CLIP with augmented data to create prompts with and without surroundings information. Then, we train the object detection model substituting the calculation of the classification score for the language embedding obtained from passing the learned prompts through the CLIP language encoder. Our method achieves 20.3 %$mathrm{AP}$ on the LVIS dataset with prompts including surroundings, and 21.6 %$mathrm{AP}$ with prompts not including surroundings. In particular, 27.9 % mathrm{AP}_f$ and 29.1 % $mathrm{AP}_f$ are achieved in the LVIS frequency class, respectively.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	deep learning / object detection / Mask R-CNN / prompt learning / CLIP
Paper #	PRMU2023-12
Date of Issue	2023-05-11 (PRMU)

Conference Information
Committee	PRMU / IPSJ-CVIM
Conference Date	2023/5/18(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Seiichi Uchida(Kyushu Univ.)
Vice Chair	Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.)
Secretary	Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo)
Assistant	Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken)

Paper Information
Registration To	Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Prompt Learning for Object Detection with Vision-Language Model
Sub Title (in English)
Keyword(1)	deep learning
Keyword(2)	object detection
Keyword(3)	Mask R-CNN
Keyword(4)	prompt learning
Keyword(5)	CLIP
1st Author's Name	Mariko Tomariguchi
1st Author's Affiliation	Oki Electric Industry Co., Ltd.(OKI)
Date	2023-05-19
Paper #	PRMU2023-12
Volume (vol)	vol.123
Number (no)	PRMU-30
Page	pp.pp.62-67(PRMU),
#Pages	6
Date of Issue	2023-05-11 (PRMU)