Presentation | 2023-09-22 Probing the ability to accurately understand and utilize the ordinal numbers by visual language models Ryuto Masuda, Hisashi Miyamori, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we investigate the extent to which visual language models have the ability to accurately grasp and utilize the concept of ordinal numbers. Although the Transformer-based large-scale pre-training models show high correct response rates for tasks such as simple arithmetic operations, it is still unclear how these models capture and utilize the concept of numbers.In this study, we focus on ordinal numbers as one of the concepts of numbers and investigate to what extent Transformer-based visual language models have the ability to grasp and utilize the concept of ordinal numbers.Specifically, we construct a new dataset for referring expression comprehension focusing on counting via ordinal numbers. CG images are generated with multiple objects placed in the image, and the objects are annotated with referring expressions which require understanding inter-object relations and counting them up.In the experiments, we evaluate the performance of referring expression comprehension tasks by typical visual language models using the constructed dataset and analyze the ability to accurately grasp and utilize the ordinal numbers. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | ordinal numbers / concept understanding / visual language model / counting operation / reasoning |
Paper # | DE2023-20 |
Date of Issue | 2023-09-14 (DE) |
Conference Information | |
Committee | DE / IPSJ-DBS / IPSJ-IFAT |
---|---|
Conference Date | 2023/9/21(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kitakyushu International Conference Center |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Bigdata management, information retrieval, knowledge discovery, etc. |
Chair | Masashi Toyoda(Univ. of Tokyo) |
Vice Chair | Kosuke Takano(Kanagawa Inst. of Tech.) / Chiemi Watanabe(Tsukuba Univ. of Technology) |
Secretary | Kosuke Takano(Univ. of Tsukuba) / Chiemi Watanabe(Komazawa Univ.) |
Assistant | Takahiro Komamizu(Nagoya Univ.) |
Paper Information | |
Registration To | Technical Committee on Data Engineering / Special Interest Group on Database System / Special Interest Group on Information Fundamentals and Access Technologies |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Probing the ability to accurately understand and utilize the ordinal numbers by visual language models |
Sub Title (in English) | |
Keyword(1) | ordinal numbers |
Keyword(2) | concept understanding |
Keyword(3) | visual language model |
Keyword(4) | counting operation |
Keyword(5) | reasoning |
1st Author's Name | Ryuto Masuda |
1st Author's Affiliation | Kyoto Sangyo University(Kyoto Sangyo Univ.) |
2nd Author's Name | Hisashi Miyamori |
2nd Author's Affiliation | Kyoto Sangyo University(Kyoto Sangyo Univ.) |
Date | 2023-09-22 |
Paper # | DE2023-20 |
Volume (vol) | vol.123 |
Number (no) | DE-192 |
Page | pp.pp.54-59(DE), |
#Pages | 6 |
Date of Issue | 2023-09-14 (DE) |