Bonjour à toutes et à tous,
Postdoc proposition - learning robust and navigable latent spaces for
generative networks, in the context of image and texture synthesis
14 septembre 2018
Axe scientifique du GT de rattachement : DataSense
— Saïd Ladjal, LTCI
— Télécom Paristech
46 Rue Barrault,
01 45 81 81 45
— Alasdair Newson, LTCI
— Télécom Paristech
46 Rue Barrault,
01 45 81 73 82
— Guillaume Charpiat, INRIA Saclay
bureau 2054, bât. Claude Shannon (autres noms : 660, "Digitéo"), Orsay
Tél : 01 69 15 39 91
— Lab : LTCI
— Digicosme group : Réseaux Profonds et Représentations Distribuées
— Teams involved : IMAGES (LTCI), TAU (INRIA Saclay)
— Duration : one year
— Application deadline : 15 October 2018
The subject of this project is to explore the use of deep learning for the purposes of image synthesis. A central
component of image synthesis and restoration algorithms is knowledge about the particular properties of natural
images, so that meaningful models and regularisations can be established. However, this is a very difficult goal
to achieve by “hand-designing” such models (Tikhonov regularisation, total variation, patch-based GMM models
), as was common until recently. A more flexible and powerful approach is to train neural networks for these restoration
and synthesis tasks, more particularly convolutional neural networks (CNNs). Examples of these networks
include denoising networks, auto-encoders and generative adversarial networks (GANs). A common, underlying
theme of these networks is the idea that natural images lie on some lower-dimensional, latent, space. A core objective
of this project is to design a convolutional neural network (CNNs) which is able to capture the underlying
structure of the image data which we are analysing, in a robust and generalisable manner. Several central questions
appear here :
1. Can the neural networks learn the underlying structure of the space on which natural images lie ?
2. Can the networks learn how to project both to and from that space ?
3. What architectures and/or regularisations are necessary to ensure good generalisation capacities of the network ?
4. Is it possible to fit workable, possibly parametric, probabilistic models to the latent space ?
If this crucial objective is attained, a wide array of synthesis possibilities are opened up, from texture synthesis
and interpolation to image denoising. A specific application of this work would be image texture synthesis. In
this context, we would like to create a network which is able to transform to and from a space where our data is
represented in a manner which is amenable to probabilistic modelling.
The relevance of these research directions is argued in the following.
2 Scientific positioning of the project and state-of-art
2.1 Robust, navigable latent spaces
Generative networks are currently a very hot topic, due to the spectacular image synthesis possibilities that they
offer for a wide variety of complex and abstract objects. They have been used, for example, by Zhu et al.  and
Ha and Eck  for the purpose of producing examples of images of specific objects with only a rough sketch as
a guide to the algorithm. Once again, the key question here is how to discover the underlying manifold of these
images. In the work of Zhu et al, this is done using a GAN, whose adversarial component seeks to distinguish
between “real” and “false” images. Once this is achieved, it is possible to navigate in this space, interpolating
between image examples. Very often, this is done by simple linear interpolation (as in Zhu et al. ), even if there
is no guarantee that the space is indeed linear. Furthermore, there remain serious questions as to the generalisation
capacity of GANs, in other words, to what extent can GANs interpolate in a data region which was unobserved in
the training data.We illustrate this in Figure 1 by showing the results of the DCGAN trained on a database of disks
with certain radii which are not observed in the database. The DCGAN learns to produce the disks it has seen, but
cannot synthesise unobserved data.
An existing approach that encourages smooth latent spaces, and thus better generalisation, is the contractive
auteoncoder , which encourages robustness of the latent space to small changes in the network input. We shall
first investigate such an approach in the context of image synthesis and generation. However, if we consider that the
data is well parameterised by the latent space, then the output should also vary smoothly w.r.t any small changes in
the parametrisation. Thus, a first approach, similar to that of Rifai et al., would be to
— Minimise the `2 norm of the Jacobian of the network output w.r.t the code z,
which specifies that the output should not greatly change as the code moves a small amount. We note here that
while different regularisation techniques play a central role in modern architectures, it is rare to find work which
analyses the generalisation capabilities of the network in a clear, controlled manner ; the only criterion is successful
classification rates. In the case of image synthesis, it is important that the latent space should be meaningful, so that
we can rely on its generalisation capabilites and, ultimately, produce meaningful interpolations in the latent space.
For this, we propose to study examples where the underlying space of the images is known, and parametrisable. This
approach is quite uncommon in the literature, and we consider that it will deliver new insights into generalisation
and interpolation, or at the very least provide a minimum performance requirement for generative networks (ie
correctly finding the latent space of images with known parametrisations, in a sufficeiently robust manner).
If this generalisation problem is sufficiently well addressed, we shall also investigate the best approaches to
interpolation itself. Some very recent work has been done, which proposes alternatives to simple linear interpolation
[16, 10], and we shall investigate these avenues linked with finding geodesics in the latent space. However, contrary
to these approaches, we also propose again to study cases where the parametrisation is known, in order to design
autoencoders and interpolation techniques that provide meaningful latent-space interpolation.
If these goals are attained, we hope that it will be possible to attain
— Robust, generalisable generative networks for image synthesis
— Meaningful interpolation in the resulting latent spaces.
An interesting application of such a network is texture synthesis, which we discuss now. This entails several
additional challenges, such as modeling the latent space with a probailistic model, and sampling textures of arbitrary
2.2 Texture synthesis
In the specific case of texture synthesis, a recent approach proposed by Gatys et al.  is to iteratively modify
an input noise such that the response of the different filters of a CNN share some statistics with the responses to the
example texture. The authors search for a local minimum of the following energy, starting from a random point :
x:= argmin (Cov((x))
(c) GdR 720 ISIS - CNRS - 2011-2018.