How to Make a Pizza: Learning a Compositional Layer-Based GAN Model

Papadopoulos, Dim P; Tamaazousti, Youssef; Ofli, Ferda; Weber, Ingmar; Torralba, Antonio

dc.contributor.author	Papadopoulos, Dim P
dc.contributor.author	Tamaazousti, Youssef
dc.contributor.author	Ofli, Ferda
dc.contributor.author	Weber, Ingmar
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2021-09-27T17:17:37Z
dc.date.available	2021-09-27T17:17:37Z
dc.date.issued	2020-01
dc.date.submitted	2019-06
dc.identifier.isbn	978-1-7281-3293-8
dc.identifier.issn	2575-7075
dc.identifier.uri	https://hdl.handle.net/1721.1/132649
dc.description.abstract	A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weakly- supervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/cvpr.2019.00819	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	How to Make a Pizza: Learning a Compositional Layer-Based GAN Model	en_US
dc.type	Article	en_US
dc.identifier.citation	Papadopoulos, Dim P. et al. "How to Make a Pizza: Learning a Compositional Layer-Based GAN Model." 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019, Long Beach, CA, USA, Institute of Electrical and Electronics Engineers, January 2020. © 2019 IEEE	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.date.submission	2021-01-28T13:53:51Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Complete	en_US

Files in this item

Name:: 1906.02839(1).pdf
Size:: 7.707Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record