Presentation 2023-09-22
Probing the ability to accurately understand and utilize the ordinal numbers by visual language models
Ryuto Masuda, Hisashi Miyamori,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we investigate the extent to which visual language models have the ability to accurately grasp and utilize the concept of ordinal numbers. Although the Transformer-based large-scale pre-training models show high correct response rates for tasks such as simple arithmetic operations, it is still unclear how these models capture and utilize the concept of numbers.In this study, we focus on ordinal numbers as one of the concepts of numbers and investigate to what extent Transformer-based visual language models have the ability to grasp and utilize the concept of ordinal numbers.Specifically, we construct a new dataset for referring expression comprehension focusing on counting via ordinal numbers. CG images are generated with multiple objects placed in the image, and the objects are annotated with referring expressions which require understanding inter-object relations and counting them up.In the experiments, we evaluate the performance of referring expression comprehension tasks by typical visual language models using the constructed dataset and analyze the ability to accurately grasp and utilize the ordinal numbers.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) ordinal numbers / concept understanding / visual language model / counting operation / reasoning
Paper # DE2023-20
Date of Issue 2023-09-14 (DE)

Conference Information
Committee DE / IPSJ-DBS / IPSJ-IFAT
Conference Date 2023/9/21(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Kitakyushu International Conference Center
Topics (in Japanese) (See Japanese page)
Topics (in English) Bigdata management, information retrieval, knowledge discovery, etc.
Chair Masashi Toyoda(Univ. of Tokyo)
Vice Chair Kosuke Takano(Kanagawa Inst. of Tech.) / Chiemi Watanabe(Tsukuba Univ. of Technology)
Secretary Kosuke Takano(Univ. of Tsukuba) / Chiemi Watanabe(Komazawa Univ.)
Assistant Takahiro Komamizu(Nagoya Univ.)

Paper Information
Registration To Technical Committee on Data Engineering / Special Interest Group on Database System / Special Interest Group on Information Fundamentals and Access Technologies
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Probing the ability to accurately understand and utilize the ordinal numbers by visual language models
Sub Title (in English)
Keyword(1) ordinal numbers
Keyword(2) concept understanding
Keyword(3) visual language model
Keyword(4) counting operation
Keyword(5) reasoning
1st Author's Name Ryuto Masuda
1st Author's Affiliation Kyoto Sangyo University(Kyoto Sangyo Univ.)
2nd Author's Name Hisashi Miyamori
2nd Author's Affiliation Kyoto Sangyo University(Kyoto Sangyo Univ.)
Date 2023-09-22
Paper # DE2023-20
Volume (vol) vol.123
Number (no) DE-192
Page pp.pp.54-59(DE),
#Pages 6
Date of Issue 2023-09-14 (DE)