Presentation | 2023-05-19 Prompt Learning for Object Detection with Vision-Language Model Mariko Tomariguchi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The two-stage object detection models crop features in the regions where objects are most likely to be to classify the objects. In this work, we investigate the influence of the surrounding information on the objects on classifying objects and improve the prompt learning method for object detection using Vision-Language models. We learn the learnable vectors correspond to input prompts to CLIP with augmented data to create prompts with and without surroundings information. Then, we train the object detection model substituting the calculation of the classification score for the language embedding obtained from passing the learned prompts through the CLIP language encoder. Our method achieves 20.3 %$mathrm{AP}$ on the LVIS dataset with prompts including surroundings, and 21.6 %$mathrm{AP}$ with prompts not including surroundings. In particular, 27.9 % mathrm{AP}_f$ and 29.1 % $mathrm{AP}_f$ are achieved in the LVIS frequency class, respectively. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | deep learning / object detection / Mask R-CNN / prompt learning / CLIP |
Paper # | PRMU2023-12 |
Date of Issue | 2023-05-11 (PRMU) |
Conference Information | |
Committee | PRMU / IPSJ-CVIM |
---|---|
Conference Date | 2023/5/18(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Seiichi Uchida(Kyushu Univ.) |
Vice Chair | Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.) |
Secretary | Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo) |
Assistant | Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Special Interest Group on Computer Vision and Image Media |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Prompt Learning for Object Detection with Vision-Language Model |
Sub Title (in English) | |
Keyword(1) | deep learning |
Keyword(2) | object detection |
Keyword(3) | Mask R-CNN |
Keyword(4) | prompt learning |
Keyword(5) | CLIP |
1st Author's Name | Mariko Tomariguchi |
1st Author's Affiliation | Oki Electric Industry Co., Ltd.(OKI) |
Date | 2023-05-19 |
Paper # | PRMU2023-12 |
Volume (vol) | vol.123 |
Number (no) | PRMU-30 |
Page | pp.pp.62-67(PRMU), |
#Pages | 6 |
Date of Issue | 2023-05-11 (PRMU) |