# Réunion

Les commentaires sont clos.

## Transport Optimal et Apprentissage Statistique

Date : 18-11-2021
Lieu : Amphitheatre Hermite, Institut Henri Pointcaré, 11 Rue Pierre et Marie Curie, 75005 Paris

Thèmes scientifiques :
• A - Méthodes et modèles en traitement de signal
• T - Apprentissage pour l'analyse du signal et des images

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

S'inscrire à la réunion.

### Inscriptions

49 personnes membres du GdR ISIS, et 52 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 100 personnes.

### Résumé

Le Transport Optimal a suscité beaucoup d'intérêt dans la communauté de l'apprentissage statistique. Ses capacités inhérentes de comparaisons des mesures de probabilités étant à présent utilisées dans beaucoup de domaines tels que la classification, la génération de données ou encore la robustesse adversaire. Récemment, le transport optimal a trouvé une vive application dans les flots génératifs, les données manquantes ou les graphes grâce à des variantes d'intérêts. Cette réunion du GDR ISIS en partenariat avec le GDR MIA a pour but de présenter les dernières avancées sur le transport optimal, sa théorie, ses variantes (transport non balancé, distance de Gromov-Wasserstein, etc.) et ses applications (modèles génératifs, graphes, etc.). Les différentes approches permettront de faire le lien à d'autres problématiques telle que l'imagerie médicale représentée par le GDR MIA.

Cette journée, organisée en partenariat avec le GDR MIA, se tiendra à l'Institut Henri Poincaré. Elle propose de faire un état des lieux sur les travaux en cours sur ces problèmes fondamentaux et appelle à des contributions sur les thèmes (non exhaustifs) suivants :

• Résultats fondamentaux en transport (convergence, convexité)
• Algorithme et résolution des problèmes de TO et TO régularisé
• Barycentres de Wasserstein, modélisation de distributions
• Apprentissage multi-tâche, transfert, adaptation de domaine et graphes
• Distance de Wasserstein, Gromov-Wasserstein, TO non balancé et Sliced Wasserstein
• Modèles génératifs, tels que Wasserstein GAN, Wasserstein AutoEncoder ou Flot Normalisant
• Transport optimal en estimation Bayésienne
• Applications à la théorie de l'apprentissage statistique
• Applications au traitement du signal
• Applications à la vision par ordinateur
• Applications aux séries temporelles

### Appel à contribution

Les personnes souhaitant présenter leurs travaux sont invitées à faire part de leur intention aux organisateurs avant le vendredi 22 Octobre 2021, en envoyant par mail aux organisateurs un titre, un résumé et la liste des auteurs avec le sujet GDR ISIS OTML aux adresses suivantes : thibault.sejourne@ens.fr, remi.flamary@polytechnique.edu, nicolas.courty@irisa.fr

### Orateurs invités

• Alain Rakotomamonjy, Université de Rouen, Criteo AI Lab
• Julie Delon, MAP5, Université de Paris
• Quentin Berthet, Google Brain, Paris

### Organisateurs

• Thibault Séjourné, CNRS, DMA ENS, PSL
• Kilian Fatras, IRISA/INRIA, Université Bretagne-Sud
• Rémi Flamary, CMAP, Ecole Polytechnique
• Nicolas Courty, IRISA/INRIA, Université Bretagne-Sud
• Gabriel Peyré, CNRS, DMA ENS, PSL

### Programme

• 9h00-9h30 : Accueil
• 9h30-10h15: Alain Rakotomamonjy, Optimal Transport in machine learning with application to Domain Adaptation
• 10h15-10h35 : Alice Le Brigant - Fisher information geometry of beta and Dirichlet distributions
• 10h35-10h55 : Javier González Delgado - Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology
• 10h55-11h25 : Pause
• 11h25-12h10 : Quentin Berthet - TBD
• 12h10-12h30 : Alberto Gonzalez Sanz - Central Limit Theorems for General Transportation Costs.
• 12h30-14h00 : Repas midi
• 14h00-14h45 : Julie Delon - Generalized Wasserstein barycenters between probability measures living on different subspaces
• 14h45-15h05 : Nicolas Keriven - Entropic Optimal Transport in Random Graphs
• 15h05-15h25 : Titouan Vayer - Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning
• 15h25-16h00 : Pause
• 16h00-16h20 : Clément Bonnet - Sliced-Wasserstein Gradient Flows
• 16h20-16h40 : Théo Lacombe - Un modèle de transport régularisé homogène avec applications au transport avec frontière.
• 16h40-17h00 : Kimia Nadjahi - Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

### Résumés des contributions

##### Alain Rakotomamonjy, Optimal Transport in machine learning with application to Domain Adaptation
In this talk, I will present the role that optimal transport can play within the context of machine learning, as optimal transport (OT) theory provides geometric tools to compare probability measures.
After a brief introduction on the basics of OT, I will describe how they can be applied in domain adaptation and representation learning context
Quentin Berthet, Stochastic optimization for entropic optimal transport
Optimal transport is a foundational problem in optimization, that allows to compare probability distributions while taking into account geometric aspects. Its optimal objective value, the Wasserstein distance, provides an important loss between distributions that has been used in many applications throughout machine learning and statistics. Recent algorithmic progress on this problem and its regularized versions have made these tools increasingly popular. However, existing techniques require solving an optimization problem to obtain a single gradient of the loss, thus slowing down first-order methods to minimize the sum of losses, that require many such gradient computations. In this work, we introduce an algorithm to solve a regularized version of this problem of Wasserstein estimators, with a time per step which is sublinear in the natural dimensions of the problem. We introduce a dual formulation, and optimize it with stochastic gradient steps that can be computed directly from samples, without solving additional optimization problems at each step. Doing so, the estimation and computation tasks are performed jointly. We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters. We provide theoretical guarantees and illustrate the performance of our algorithm with experiments on synthetic data.
Julie Delon, Generalized Wasserstein barycenters between probability measures living on different subspaces
We introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of R^d. We study the existence and uniqueness of this barycenter, we show how it is related to a larger multi-marginal optimal transport problem, and we propose a dual formulation. Finally, we explain how to compute numerically this generalized barycenter on discrete distributions, and we propose an explicit solution for Gaussian distributions.

##### Alice Le Brigant, Fisher information geometry of beta and Dirichlet distributions

The Fisher information metric provides a Riemannian framework to compare probability distributions inside a parametric family. The most well-known example is the univariate gaussian family, on which the Fisher information geometry amounts to hyperbolic geometry. In this talk we will investigate the Fisher information geometry of Dirichlet distributions and show that it is negatively curved and geodesically complete. This guarantees the uniqueness of the notion of mean and makes it a suitable geometry to apply the Riemannian K-means algorithm. We illustrate the use of the Fisher information geometry of beta distributions, a particular case of Dirichlet distributions, to compare and classify histograms of medical data.

Alberto Gonzalez Sanz, Central Limit Theorems for General Transportation Costs.

One of the main ways to quantify the distance between distributions is the well known Wasserstein metric. In Statistics and Machine Learning applications it is increasingly common to deal with measures supported on a high dimensional space. Some recent results show that the Wasserstein metric suffers from the curse of dimensionality, which means that its empirical approximation becomes worse as dimension grows. We will explain a new method based on the Efron-Stein inequality and on the sequential compactness of the closed unit ball in $L^2 (P)$ for the weak topology that improves a result of del Barrio and Loubes (2019) and states that, even if the empirical Wasserstein metric converges with slow rate, its oscillations around its mean are asymptotically Gaussian with rate $\sqrt{n}$, $n$ being the sample size, which means that the curse of dimensionality is avoided in such a case. Finally, we will present some applications of these results to statistical and data science problems.

with Tasio del Barrio and Jean Michel Loubes

Javier González-Delgado, Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology

This work is motivated by the study of local protein struc-ture, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space $\mathcal{P}(\mathbb{R}^2/\mathbb{Z}^2)$ with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions.

We extend existing results of the theory of Optimal Transport to the d-dimensional flat torus $\mathbb{T}^d=\mathbb{R}^d/\mathbb{Z}^d$, in particular a Central Limit Theorem. Moreover, we assess different techniques for two-sample goodness-of-fit testing for the two-dimensional case, based on the Wasserstein distance. We provide an implentation of these approaches in \textsf{R}. Their performance is illustrated by numerical experiments on synthetic data and protein structure data. The full work is available at https://arxiv.org/pdf/2108.00165.pdf.

Liste des auteurs: Javier González-Delgado, Alberto González-Sanz, Juan Cortés et Pierre Neuvial.

Théo Lacombe, Un modèle de transport régularisé homogène avec applications au transport avec frontière.

La formulation standard du transport optimal entre mesures de probabilités peut se généraliser de nombreuses façons ; on s'intéressera ici notamment aux modèles incluant une régularisation entropique (notamment les « divergences de Sinkhorn ») d'une part, et aux modèles de transport entre mesures de masses différentes d'autres part. Un cadre pour unifier ces deux variantes a été proposé récemment par Séjourné et al. dans un article naturellement intitulé Sinkhorn divergences for Unbalanced Optimal Transport. En s'appuyant sur ce travail, nous présenterons un modèle de transport partiel régularisé qui possède une propriété d'homogénéité utile, notamment numériquement, et nous verrons comment ce modèle s'applique au « transport optimal avec frontière », une variante du transport partiel optimal.

Titouan Vayer, Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Abstract: Comparing probability distributions is at the crux of many machine learning algorithms. Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between probability measures that have attracted abundant attention in past years. This paper establishes some conditions under which the Wasserstein distance can be controlled by MMD norms. Our work is motivated by the compressive statistical learning (CSL) theory, a general framework for resource-efficient large scale learning in which the training data is summarized in a single vector (called sketch) that captures the information relevant to the considered learning task. Inspired by existing results in CSL, we introduce the Hölder Lower Restricted Isometric Property (Hölder LRIP) and show that this property comes with interesting guarantees for compressive statistical learning. Based on the relations between the MMD and the Wasserstein distance, we provide guarantees for compressive statistical learning by introducing and studying the concept of Wasserstein learnability of the learning task, that is when some task-specific metric between probability distributions can be bounded by a Wasserstein distance.

en collaboration avec Rémi Gribonval à l'ENS Lyon.

Kimia Nadjahi, Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.

Auteurs : Kimia Nadjahi (Télécom Paris), Alain Durmus (ENS Paris-Saclay), Pierre E. Jacob (Harvard University), Roland Badeau (Télécom Paris), Umut ?im?ekli (Inria, ENS)

Clément Bonet, Sliced-Wasserstein Gradient Flows

Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows. To solve them numerically, a possible approach is to rely on the Jordan?Kinderlehrer?Otto (JKO) scheme which is analogous to the proximal scheme in Euclidean spaces. However, this bilevel optimization problem is known for its computational challenges, especially in high dimension. To alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier's theorem, and using gradients of Input Convex Neural Networks to parameterize the density (JKO-ICNN). However, this method comes with a high computational cost and stability issues. Instead, this work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flexible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus, the density at each step can be parameterized by any generative model which alleviates the computational burden and makes it tractable in higher dimensions. Interestingly, we also show empirically that these gradient flows are strongly related to the usual Wasserstein gradient flows, and that they can be used to minimize efficiently diverse machine learning functionals.

Auteurs: Clément Bonet, Nicolas Courty, Lucas Drumetz and François Septier

Nicolas Keriven, Entropic Optimal Transport in Random Graphs

In graph analysis, a classic task consists in computing similarity measures between (groups of) nodes. In latent space random graphs, nodes are associated to unknown latent variables. One may then seek to compute distances directly in the latent space, using only the graph structure. In this paper, we show that it is possible to consistently estimate entropic-regularized Optimal Transport (OT) distances between groups of nodes in the latent space. We provide a general stability result for entropic OT with respect to perturbations of the cost matrix. We then apply it to several examples of random graphs, such as graphons or epsilon-graphs on manifolds. Along the way, we prove new concentration results for the so-called Universal Singular Value Thresholding estimator, and for the estimation of geodesic distances on a manifold, that are of independent interest.