IEICE Information and Communication Technology Forum
Multi-Modal Conditional Image Generation: A Comparative Study
Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem,
PDF download (475.9KB)
Text-to-image synthesis is referring to converting textual features into pixels, which requires full understanding of the relation between the visual features and natural language. In contrast to most of the existing text-to-image methods, which ignore the information from the original images and only generates images based on input text, some models take into account both text descriptions and original images. This paper aims to review the work presented in this domain specifically during the last four years. It also presents a comparative study to get a clear overview.