Multi-Modal Conditional Image Generation: A Comparative Study

Razan Bayoumi; Marco Alfonse; Abdel-Badeeh M. Salem

Summary

IEICE Information and Communication Technology Forum

2020

Session Number:ICTF_5

Session:

Number:ICTF2020_paper_19

Multi-Modal Conditional Image Generation: A Comparative Study

Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem,

pp.-

Publication Date:2021/03/24

Online ISSN:2188-5079

DOI:10.34385/proc.64.ICTF2020_paper_19

PDF download (475.9KB)

Summary:

Text-to-image synthesis is referring to converting textual features into pixels, which requires full understanding of the relation between the visual features and natural language. In contrast to most of the existing text-to-image methods, which ignore the information from the original images and only generates images based on input text, some models take into account both text descriptions and original images. This paper aims to review the work presented in this domain specifically during the last four years. It also presents a comparative study to get a clear overview.