姿勢変換のための姿勢知覚トランスフォーマーネットワーク

柴崎 圭; 池原 雅章

Presentation	2022-12-16 Pose-aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation Kei Shibasaki, Masaaki Ikehara,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Pose Guided Person Image Generation (PGPIG) is the task that transforms the pose of a person image from the source image, its pose information and the target pose information. Most existing PGPIG methods require additional pose information or tasks, limiting their application. In addition, all input information is combined and fed into the network, and CNNs are used as the feature extractor. However, CNNs can only extract features from neighboring pixels and cannot consider the consistency of the entire image. Furthermore, they combine the input information before extracting enough features, making it unclear which task the network should learn, which degrades the network performance. This paper proposes a PGPIG network that addresses the image consistency problem and clarifies which task the network should learn. The proposed method disentangles the PGPIG task into two sub tasks: “rough pose transformation” and “detailed texture generation”. In the former task, low-resolution feature maps are transformed by blocks containing Axial Transformer with a large receptive field. These blocks employ an Encoder-Decoder structure, which allows the network to use the pose information well and improves the stability and performance of the training. The latter task uses a CNN network with Adaptive Instance Normalization. Experiments show that the proposed method has competitive performance with other state-of-the-art methods. Furthermore, despite achieving excellent performance, the proposed network has a significantly fewer parameters than existing methods.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Deep learning / Image Processing / Pose Guided Person Image Generation / Transformer / Multi-scale Network
Paper #	PRMU2022-44
Date of Issue	2022-12-08 (PRMU)

Conference Information
Committee	PRMU
Conference Date	2022/12/15(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Toyama International Conference Center
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Seiichi Uchida(Kyushu Univ.)
Vice Chair	Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.)
Secretary	Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo)
Assistant	Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken)

Paper Information
Registration To	Technical Committee on Pattern Recognition and Media Understanding
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Pose-aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
Sub Title (in English)
Keyword(1)	Deep learning
Keyword(2)	Image Processing
Keyword(3)	Pose Guided Person Image Generation
Keyword(4)	Transformer
Keyword(5)	Multi-scale Network
1st Author's Name	Kei Shibasaki
1st Author's Affiliation	Keio University(Keio Univ.)
2nd Author's Name	Masaaki Ikehara
2nd Author's Affiliation	Keio University(Keio Univ.)
Date	2022-12-16
Paper #	PRMU2022-44
Volume (vol)	vol.122
Number (no)	PRMU-314
Page	pp.pp.63-69(PRMU),
#Pages	7
Date of Issue	2022-12-08 (PRMU)