Presentation 2022-12-16
Pose-aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
Kei Shibasaki, Masaaki Ikehara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Pose Guided Person Image Generation (PGPIG) is the task that transforms the pose of a person image from the source image, its pose information and the target pose information. Most existing PGPIG methods require additional pose information or tasks, limiting their application. In addition, all input information is combined and fed into the network, and CNNs are used as the feature extractor. However, CNNs can only extract features from neighboring pixels and cannot consider the consistency of the entire image. Furthermore, they combine the input information before extracting enough features, making it unclear which task the network should learn, which degrades the network performance. This paper proposes a PGPIG network that addresses the image consistency problem and clarifies which task the network should learn. The proposed method disentangles the PGPIG task into two sub tasks: “rough pose transformation” and “detailed texture generation”. In the former task, low-resolution feature maps are transformed by blocks containing Axial Transformer with a large receptive field. These blocks employ an Encoder-Decoder structure, which allows the network to use the pose information well and improves the stability and performance of the training. The latter task uses a CNN network with Adaptive Instance Normalization. Experiments show that the proposed method has competitive performance with other state-of-the-art methods. Furthermore, despite achieving excellent performance, the proposed network has a significantly fewer parameters than existing methods.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Deep learning / Image Processing / Pose Guided Person Image Generation / Transformer / Multi-scale Network
Paper # PRMU2022-44
Date of Issue 2022-12-08 (PRMU)

Conference Information
Committee PRMU
Conference Date 2022/12/15(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Toyama International Conference Center
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Seiichi Uchida(Kyushu Univ.)
Vice Chair Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.)
Secretary Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo)
Assistant Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Pose-aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
Sub Title (in English)
Keyword(1) Deep learning
Keyword(2) Image Processing
Keyword(3) Pose Guided Person Image Generation
Keyword(4) Transformer
Keyword(5) Multi-scale Network
1st Author's Name Kei Shibasaki
1st Author's Affiliation Keio University(Keio Univ.)
2nd Author's Name Masaaki Ikehara
2nd Author's Affiliation Keio University(Keio Univ.)
Date 2022-12-16
Paper # PRMU2022-44
Volume (vol) vol.122
Number (no) PRMU-314
Page pp.pp.63-69(PRMU),
#Pages 7
Date of Issue 2022-12-08 (PRMU)