Presentation 2022-12-16
On a difficulty of TextVQA
Koki Nakamura, Seiichi Uchida,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Text Visual Question Answering (TextVQA) is a task that uses textual information in images to generate natural language answers to questions about the images. Various methods have been proposed for this task, but even with the current state-of-the-art, the accuracy rate is only about 70%. To understand what factors make TextVQA difficult, we attempted to analyze the results of binary classification predictions of whether or not the model is capable of producing correct answers, using the answers to the TextVQA model and the dataset used in its training as input.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) TextVQA / multimodal / binary classification
Paper # PRMU2022-51
Date of Issue 2022-12-08 (PRMU)

Conference Information
Committee PRMU
Conference Date 2022/12/15(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Toyama International Conference Center
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Seiichi Uchida(Kyushu Univ.)
Vice Chair Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.)
Secretary Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo)
Assistant Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) On a difficulty of TextVQA
Sub Title (in English)
Keyword(1) TextVQA
Keyword(2) multimodal
Keyword(3) binary classification
1st Author's Name Koki Nakamura
1st Author's Affiliation Kyushu University(Kyushu Univ.)
2nd Author's Name Seiichi Uchida
2nd Author's Affiliation Kyushu University(Kyushu Univ.)
Date 2022-12-16
Paper # PRMU2022-51
Volume (vol) vol.122
Number (no) PRMU-314
Page pp.pp.100-105(PRMU),
#Pages 6
Date of Issue 2022-12-08 (PRMU)