Presentation 2023-05-19
Prompt Learning for Object Detection with Vision-Language Model
Mariko Tomariguchi,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The two-stage object detection models crop features in the regions where objects are most likely to be to classify the objects. In this work, we investigate the influence of the surrounding information on the objects on classifying objects and improve the prompt learning method for object detection using Vision-Language models. We learn the learnable vectors correspond to input prompts to CLIP with augmented data to create prompts with and without surroundings information. Then, we train the object detection model substituting the calculation of the classification score for the language embedding obtained from passing the learned prompts through the CLIP language encoder. Our method achieves 20.3 %$mathrm{AP}$ on the LVIS dataset with prompts including surroundings, and 21.6 %$mathrm{AP}$ with prompts not including surroundings. In particular, 27.9 % mathrm{AP}_f$ and 29.1 % $mathrm{AP}_f$ are achieved in the LVIS frequency class, respectively.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) deep learning / object detection / Mask R-CNN / prompt learning / CLIP
Paper # PRMU2023-12
Date of Issue 2023-05-11 (PRMU)

Conference Information
Committee PRMU / IPSJ-CVIM
Conference Date 2023/5/18(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Seiichi Uchida(Kyushu Univ.)
Vice Chair Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.)
Secretary Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo)
Assistant Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Prompt Learning for Object Detection with Vision-Language Model
Sub Title (in English)
Keyword(1) deep learning
Keyword(2) object detection
Keyword(3) Mask R-CNN
Keyword(4) prompt learning
Keyword(5) CLIP
1st Author's Name Mariko Tomariguchi
1st Author's Affiliation Oki Electric Industry Co., Ltd.(OKI)
Date 2023-05-19
Paper # PRMU2023-12
Volume (vol) vol.123
Number (no) PRMU-30
Page pp.pp.62-67(PRMU),
#Pages 6
Date of Issue 2023-05-11 (PRMU)