9 mai 2018

Catégorie : Doctorant

Subject: Optimal transport for deep learning and deep learning for optimal transport

Supervision: Nicolas Courty and Rémi Flamary

Locations: Vannes and Nice, France

**PhD position to be filled in September/October 2018**

**Key-words: machine learning, optimal transport, deep learning**

The Wasserstein distance is a powerful tool based on the theory of optimal transport to compare data distributions with wide applications in image processing, computer vision and machine learning [1]. In a context of machine learning, it has recently found numerous applications, e.g. domain adaptation [2], or word embedding [3]. In the context of deep learning, the Wasserstein appeared recently to be a powerful loss in generative models [4] and in multi-label classification [5]. Its power comes from two major reasons: i) it allows to operate on empirical data distributions in a non-parametric way ii) the geometry of the underlying space can be leveraged to compare the distributions in a geometrically sound way. Yet, the deployment of Wasserstein distances in a wider class of applications is somehow limited, especially because of an heavy computational burden. Recent strategies implying entropic regularization [6] also fail to consider large scale datasets. Remarkably, the problem is amenable to stochastic programming thanks to its dual (and potentially regularized) formulation [7, 8]. Those recent advances pave the way for a large number of applications in learning with with deep networks, as soon as the Wasserstein distance serves in a loss function.

Scientific objectives and expected achievements The objective of the PhD will be to contribute in this direction, by examing at the same time the two following directions:

- how large scale optimal transport can contribute to deep learning? the applicant will consider modern large scale problems for which using Wasserstein distance is or can be beneficial. For example, new directions in large scale domain adaptation problem will be considered, as well as learning with noisy labels or noisy data sample or reducing bias in predictive modeling.
- how deep learning can contribute to optimal transport? recent works have shown that learning the Kantorovich potentials in the dual formulation of the optimal transport can be efficiently conducted by neural networks [4, 8]. The PhD candidate will delve into this direction, by trying to enforce specific constrains in the architecture of the networks or considering new optimization schemes. Also, alternative paths such as [9] can be considered.

In the end the contributions of the PhD students will target cutting edges researches in machine learning, and the expected outcomes will be possibly published in top tier machine learning conferences and journals. From an application point of view, a particular attention will be given on remote sensing and astronomical imaging datasets, on which the two teams are specialized.

OATMIL project/supervision The phd will occur in the context of the ANR OATMIL project (http://people.irisa.fr/Nicolas.Courty/OATMIL/), that provides the funding for this research. As such, the candidate will have to develop strong interactions with the other participants and contribute to its success. The supervision will be done by Nicolas Courty^{1} and Rémi Flamary^{2}

The research will take place in the context of a collaboration between IRISA and laboratoire Lagrange.

- IRISA laboratory (http://www.irisa.fr/), which is a joint research unit between CNRS, INRIA and several Universities and Engineering schools. IRISA conducts research in computer science, applied mathematics and signal and image processing. More specifically, the PhD student will be located in the OBELIX team (Environment Observation by Complex Imagery, https://www-obelix.irisa.fr/, which focuses on image analysis, machine learning and data mining, mostly for environmental data and remote sensing, and that is colocated between Vannes and Rennes (France).
- Lagrange laboratory (https://www.oca.eu/en/welcome-lagrange) is a joint laboratory of CNRS, Observatoire de la Côte d’Azur (OCA) and Université Nice Sophia Antipolis (UNS). The PhD student will be hosted is Nice by the Signal & Image Processing group. The group has strong methodological expertise in statistical inference, machine learning, optimisation and inverse problems, and large experience in applications from astronomy, Earth science, acoustics and biomedical engineering.

The PhD will take place both in Vannes, a beautiful medieval city of medium size close to the sea (2h30 in train from Paris), and Nice. The final repartition of the time periods spent in the two locations will be discussed and decided with the student along the PhD.

Technical aspects The applied part of the PhD will lead to development in Python. The candidate will build upon the python toolbox for optimal transport (POT: https://github.com/rflamary/POT) developed by members of the team among others. He/she will benefit from the expertise of the other members of the team, as well as ongoing collaborations with other academic partners on this subject.

Candidate profile and application Applicants are expected to be graduated in computer science and/or machine learning and/or signal & image processing and/or applied mathematics/statistics, and show an excellent academic profile. Beyond, good programming skills are expected. To apply, send a resume, along with grades obtained during the last two years and possibly recommendation letters, to Nicolas Courty (nicolas.courty@irisa.fr) and Rémi Flamary (remi.flamary@unice.fr

[1] G. Peyré and M. Cuturi, Computational Optimal Transport. To be published in Foundations and Trends in Computer Science, 2018. [Online]. Available: https://optimaltransport.github.io

[2] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[3] G. Huang, C. Guo, M. Kusner, Y. Sun, F. Sha, and K. Weinberger, “Supervised word mover’s distance,” in Advances in Neural Information Processing Systems, 2016, pp. 4862–4870.

[4] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70, Sydney, Australia, 06–11 Aug 2017, pp. 214–223.

[5] C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. Poggio, “Learning with a Wasserstein loss,” in NIPS, 2015.

[6] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transportation,” in Advances on Neural Information Processing Systems (NIPS), 2013, pp. 2292–2300.

[7] A. Genevay, M. Cuturi, G. Peyré, and F. Bach, “Stochastic optimization for large-scale optimal transport,” in Advances in Neural Information Processing Systems, 2016, pp. 3432–3440.

[8] V. Seguy, B. Bhushan Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel, “Large-scale optimal transport and mapping estimation,” in International Conference on Learning Representations (ICLR), 2018.

[9] N. Courty, R. Flamary, and M. Ducoffe, “Learning wasserstein embeddings,” in International Conference on LearningRepresentations (ICLR), 2018.

(c) GdR 720 ISIS - CNRS - 2011-2018.