14 septembre 2018

Catégorie : Post-doctorant

Bonjour à toutes et à tous,

Voici une proposition de postdoc envoyée par Alasdair James Newson et Saïd Ladjal.

Amicalement,

Edoardo.

**Postdoc proposition - learning robust and navigable latent spaces for**

**generative networks, in the context of image and texture synthesis**

14 septembre 2018

Axe scientifique du GT de rattachement : DataSense

Supervisors

— Saïd Ladjal, LTCI

— Télécom Paristech

46 Rue Barrault,

75013 Paris

said.ladjal@telecom-paristech.fr

01 45 81 81 45

— Alasdair Newson, LTCI

— Télécom Paristech

46 Rue Barrault,

75013 Paris

alasdair.newson@telecom-paristech.fr

01 45 81 73 82

— Guillaume Charpiat, INRIA Saclay

— Guillaume.Charpiat@inria.fr

bureau 2054, bât. Claude Shannon (autres noms : 660, "Digitéo"), Orsay

Tél : 01 69 15 39 91

— Lab : LTCI

— Digicosme group : Réseaux Profonds et Représentations Distribuées

— Teams involved : IMAGES (LTCI), TAU (INRIA Saclay)

— Duration : one year

— Application deadline : 15 October 2018

Scientific project

1 Context

The subject of this project is to explore the use of deep learning for the purposes of image synthesis. A central

component of image synthesis and restoration algorithms is knowledge about the particular properties of natural

images, so that meaningful models and regularisations can be established. However, this is a very difficult goal

to achieve by “hand-designing” such models (Tikhonov regularisation, total variation, patch-based GMM models

[19]), as was common until recently. A more flexible and powerful approach is to train neural networks for these restoration

and synthesis tasks, more particularly convolutional neural networks (CNNs). Examples of these networks

include denoising networks, auto-encoders and generative adversarial networks (GANs). A common, underlying

theme of these networks is the idea that natural images lie on some lower-dimensional, latent, space. A core objective

of this project is to design a convolutional neural network (CNNs) which is able to capture the underlying

structure of the image data which we are analysing, in a robust and generalisable manner. Several central questions

appear here :

1. Can the neural networks learn the underlying structure of the space on which natural images lie ?

2. Can the networks learn how to project both to and from that space ?

3. What architectures and/or regularisations are necessary to ensure good generalisation capacities of the network ?

4. Is it possible to fit workable, possibly parametric, probabilistic models to the latent space ?

If this crucial objective is attained, a wide array of synthesis possibilities are opened up, from texture synthesis

and interpolation to image denoising. A specific application of this work would be image texture synthesis. In

this context, we would like to create a network which is able to transform to and from a space where our data is

represented in a manner which is amenable to probabilistic modelling.

The relevance of these research directions is argued in the following.

2 Scientific positioning of the project and state-of-art

2.1 Robust, navigable latent spaces

Generative networks are currently a very hot topic, due to the spectacular image synthesis possibilities that they

offer for a wide variety of complex and abstract objects. They have been used, for example, by Zhu et al. [18] and

Ha and Eck [7] for the purpose of producing examples of images of specific objects with only a rough sketch as

a guide to the algorithm. Once again, the key question here is how to discover the underlying manifold of these

images. In the work of Zhu et al, this is done using a GAN, whose adversarial component seeks to distinguish

between “real” and “false” images. Once this is achieved, it is possible to navigate in this space, interpolating

between image examples. Very often, this is done by simple linear interpolation (as in Zhu et al. [18]), even if there

is no guarantee that the space is indeed linear. Furthermore, there remain serious questions as to the generalisation

capacity of GANs, in other words, to what extent can GANs interpolate in a data region which was unobserved in

the training data.We illustrate this in Figure 1 by showing the results of the DCGAN trained on a database of disks

with certain radii which are not observed in the database. The DCGAN learns to produce the disks it has seen, but

cannot synthesise unobserved data.

An existing approach that encourages smooth latent spaces, and thus better generalisation, is the contractive

auteoncoder [14], which encourages robustness of the latent space to small changes in the network input. We shall

first investigate such an approach in the context of image synthesis and generation. However, if we consider that the

data is well parameterised by the latent space, then the output should also vary smoothly w.r.t any small changes in

the parametrisation. Thus, a first approach, similar to that of Rifai et al., would be to

— Minimise the `2 norm of the Jacobian of the network output w.r.t the code z,

which specifies that the output should not greatly change as the code moves a small amount. We note here that

while different regularisation techniques play a central role in modern architectures, it is rare to find work which

analyses the generalisation capabilities of the network in a clear, controlled manner ; the only criterion is successful

classification rates. In the case of image synthesis, it is important that the latent space should be meaningful, so that

we can rely on its generalisation capabilites and, ultimately, produce meaningful interpolations in the latent space.

For this, we propose to study examples where the underlying space of the images is known, and parametrisable. This

approach is quite uncommon in the literature, and we consider that it will deliver new insights into generalisation

and interpolation, or at the very least provide a minimum performance requirement for generative networks (ie

correctly finding the latent space of images with known parametrisations, in a sufficeiently robust manner).

If this generalisation problem is sufficiently well addressed, we shall also investigate the best approaches to

interpolation itself. Some very recent work has been done, which proposes alternatives to simple linear interpolation

[16, 10], and we shall investigate these avenues linked with finding geodesics in the latent space. However, contrary

to these approaches, we also propose again to study cases where the parametrisation is known, in order to design

autoencoders and interpolation techniques that provide meaningful latent-space interpolation.

If these goals are attained, we hope that it will be possible to attain

— Robust, generalisable generative networks for image synthesis

— Meaningful interpolation in the resulting latent spaces.

An interesting application of such a network is texture synthesis, which we discuss now. This entails several

additional challenges, such as modeling the latent space with a probailistic model, and sampling textures of arbitrary

sizes.

2.2 Texture synthesis

In the specific case of texture synthesis, a recent approach proposed by Gatys et al. [5] is to iteratively modify

an input noise such that the response of the different filters of a CNN share some statistics with the responses to the

example texture. The authors search for a local minimum of the following energy, starting from a random point :

x:= argmin (Cov((x))

(c) GdR 720 ISIS - CNRS - 2011-2018.