Echantillonnage avec dépendance négative
Thèmes scientifiques :
- A - Méthodes et modèles en traitement de signal
Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.
17 personnes membres du GdR ISIS, et 12 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 60 personnes.
Les slides des présentations sont disponibles à ce lien:
Merci aux orateurs et aux participants!
L'échantillonnage à dépendance négative consiste à sélectionner des objets "avec répulsion", c'est-à-dire en évitant d'inclure des objets trop similaires entre eux. Il peut s'appliquer à l'intégration numérique, à l'interpolation, à la discrétisation de signaux, au design d'expériences, etc. L'objectif de cette journée est de faire le point sur les progrès récents en échantillonnage répulsif dans différents champs voisins du traitement du signal.
- Patrice Bertail, MODAL'X, Paris Nanterre
- David Coeurjolly, LIRIS
- Agnès Desolneux, Centre Borelli, ENS
- Gilles Pagès, LPSM, Sorbonne
- François Portier, CREST, ENSAI
- Luc Pronzato, i3S, Université de Nice
- Antoine Souloumiac, LIST, CEA
TOTEM / Institut des Systèmes Complexes Paris IdF, 11 Place Nationale 75013 Paris
Date et horaires :
16 sept. 2022 9h-18h
Rémi Bardenet (CRIStAL, CNRS), Simon Barthelmé (Gipsa-lab, CNRS).
Avec le soutien de l'ERC Blackjack.
Date limite pour les inscriptions: jeudi 8 sept. 2022
9h-10h G. Pagès, From optimal to random vector quantization of distribution in high dimension
10h30-11h30 D. Coeurjolly, Sample Correlation for Monte Carlo Rendering
11h30-12h30 F. Portier
12h30-13h30 Repas de midi (traiteur)
13h30-14h30 P. Bertail, Exponential bounds for survey sampling
14h30-15h30 L. Pronzato, Nested sampling designs with good covering properties
16h00-17h00 A. Desolneux, Processus ponctuels déterminantaux en traitement d'image.
17h00-18h00 A. Souloumiac, Blind reconstruction of an analog band-pass signal
Résumés des contributions
Exponential bounds for survey sampling : from Poisson to Negatively associated sampling plans.
The purpose of this talk is to show how to obtain Bernstein bounds for negatively associated (NA) random variables and to apply them to some specific time series and to some NA survey sampling plans, to obtain for instance non-asymptotic confidence intervals in these frameworks. Only recently functional asymptotic results/exponential inequalities for the empirical cdf or the empirical processes indexed by classes of functions have been obtained for some specific survey sampling (see Breslow and Wellner (2007), Boistard et al. (2017), Bertail et al. (2016)). It can be seen from these last two works that the notion of negative association is very important to control for instance the subgaussianity of the process.
Indeed the theory of negative dependence (see Joag-Dev and Proschan (1983)) has gained considerable attention in the last years to study dependent variables in survey sampling. A particularly important breakthrough was made by Borcea and Brändén (2009) and applied to survey sampling in Brändén and Jonasson (2012). Unfortunately the available CLT/inequalities for NA random variables can not be applied directly to survey sampling plan, because of non stationarity of the sampling plan and the complicated covariance structure of the process.
We first show how to obtain crude inequalities for NA rv's and how to obtain improved inequalities when it is possible to control the sum of covariances. We finally show how one can get even better inequalities for specific NA sampling plan close to rejective sampling.
Sample Correlation for Monte Carlo Rendering
When simulating light transport for physically based image rendering, complex integrals of high-dimensional functions representing the light radiant energy need to be evaluated at each optical interaction. Over the past decade, Monte Carlo based integrators have become the cornerstone of many rendering pipeline. They rely on sampling the integrand at sample locations with a direct impact of the sample distribution on the convergence rate and noise in the final renderings when samples are correlated. In this presentation, we will first review several strategies that have been used in Computer Graphics to evaluate the quality of a point process in this MC rendering context (spectral properties, discrepancy, optimal transport). We will also discuss the design of sampling algorithms to achieve best variance reduction for effective rendering workflows.
Processus ponctuels déterminantaux en traitement d'image.
Dans ce travail, effectué en collaboration avec Bruno Galerne (Université d'Orléans) et Claire Launay (Albert Einstein College of Medicine, New-York), nous nous intéressons à l'utiisation des processus ponctuels déterminantaux (DPP) pour deux tâches en traitement d'image : d'une part la synthèse de texture par modèle de type spot-noise où les points sont alors les pixels de l'image, et d'autre part le sous-échantillonnage de l'ensemble des patchs d'une image. Dans ces deux cas, nous examinerons le rôle important joué par le noyau du DPP utilisé.
From optimal to random vector quantization of distribution in high dimension
Optimal vector quantization theory goes back to the 1950?s motivated by signal processing and transmission. It has strong connections and applications with unsupervised learning and classification (k-means and has been applied and developed more recently as a tool in numerical probability and optimal transport (Wasserstein distance). For all these purposes its main feature is that it produces an approximation or a compression of a probability distribution (typically an empirical measure in data-science). However when the dimension increases, the access to optimal quantizers of a given distribution $\mu$ becomes harder and harder whereas its « efficiency » is reduced by the curse of dimensionality. A natural idea to circumvent this problem is to produce the codewords of the quantizers as i.i.d samples of a distribution $\nu$ possibly different from $\mu$. We will investigate the properties of such random quantizers en terms of $L^p$-quantization rate in various ways including a sharp a.s. rate of quantization. We will also discuss the choice of the distribution $\nu$. Such results have found recently applications to federated learning.
Speeding up Monte Carlo: Nearest Neighbors estimates as Control Variates
I will first give an overview - with particular interest on their convergence rates - of several popular alternatives to Monte Carlo including repulsive sampling, importance sampling, adaptive volume calculation, control variates. Then I will present a novel integration rule based on nearest neighbor estimates acting as control variates. The main result is a convergence rate on this new estimate which achieves the best possible rate for Lipschitz integrand. Several numerical experiments validates such complexity bound and highlights the good performance of the proposed estimator.
Nested sampling designs with good covering properties
The covering radius and Lr -quantization error (r > 1) of a sampling design are key factors for the derivation of error bounds for function approximation or integration; see for instance [2, Prop. 3.2], [9, Chap.11], . Constructions of designs with small covering radii or small quantization error have received a lot of attention, in particular those forming regular patterns such as lattices . Incremental constructions, although of major practical interest, have received less attention. There exist bounds on the covering radius (also called dispersion) for low discrepancy sequences used in Quasi-Monte Carlo methods [3, Chap. 6], but they are extremely pessimistic and the performances of these constructions are rather deceiving. Three incremental constructions will be presented in the talk, based on the minimization of a Maximum-Mean-Discrepancy by kernel herding [7, 6], on the greedy maximization of an integrated covering measure that defines a submodular set function , or on geometrical considerations leading to the greedy-packing algorithm and its boundary-phobic variants. In the later case, performance guarantees can be provided . This work is partly supported by the ANR project INDEX (INcremental Design of EXperiments), nb. ANR- 18-CE91-0007.
 J.H. Conway and N.J.A. Sloane. Sphere Packings, Lattices and Groups. Springer, New York, 1999. [3rd ed.].
 F.J. Narcowich, J.D. Ward, and H. Wendland. Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting. Mathematics of Computation, 74(250):743-763, 2005.
 H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia, 1992.
 A. Nogales Gómez, L. Pronzato, and M.-J. Rendas. Incremental space-filling design based on coverings and spacings: improving upon low discrepancy sequences. Journal of Statistical Theory and Practice, 2021. (to appear, HAL preprint hal-02987983).
 G. Pagès. A space quantization method for numerical integration. Journal of Computational and Applied Mathematics, 89(1):138, 1997.
 L. Pronzato. Performance analysis of greedy algorithms for minimising a maximum mean discrepancy. Statistics and Computing, 2022. (accepted, hal-03114891, arXiv:2101.07564).
 L. Pronzato and A.A. Zhigljavsky. Bayesian quadrature, energy minimization and space-filling design. SIAM/ASA J. Uncertainty Quantication, 8(3):9591011, 2020.
 L. Pronzato and A.A. Zhigljavsky. Quasi-uniform designs with asymptotically optimal and near-optimal uniformity constant. hal-03494864, arXiv:2112.10401, 2022.
 H. Wendland. Scattered Data Approximation. Cambridge University Press, 2005.
Blind reconstruction of an analog band pass signal using only the crossing times of few thresholds of UNKNOWN amplitudes
A classical Analog-to-Digital Converter measures the signal amplitude at fixed time moments (a priori time grid), conversely a Level-Crossing ADC (LC-ADC) measures the time moments when the signal crosses few fixed threshold levels (a priori amplitude grid). The LC-ADC principle allows a very efficient analog circuit but the resulting irregular sampling in time makes necessary the digital reconstruction of the signal spectrum and/or uniform sampling. We show via numerical simulations that a very accurate reconstruction of signal samples is possible, up to a harmless scale (and possibly offset) indetermination, even if the thresholds amplitudes are UNKNOWN, which is true in practice. This blind reconstruction is achieved by computing the intersection of two linear subspaces with standard algebra techniques. Many challenging issues remain open ranging from theoretical performance analysis to practical implementation.