Sparse-domain data (signal/image) processing for classification and learning: basis/frame influence and selection in designing and training scattering networks with experimental data (geosciences, chemistry)
IFP Energies nouvelles (IFPEN) is a major research and training player in the fields of energy, transport, and environment. From research to industry, technological innovation is central to all its activities, structured around three strategic priorities: sustainable mobility, new energies, and responsible oil and gas. As part of the public-interest mission with which it has been tasked by the public authorities, IFPEN focuses on: (a) providing solutions to take up the challenges facing society in terms of energy and the climate, promoting the transition towards sustainable mobility and the emergence of a more diversified energy mix; (b) creating wealth and jobs by supporting French and European economic activity, and the competitiveness of related industrial sectors
We wish to study large datasets of experimental data (e.g. physico-chemical spectral signals, microscopy or geophysical subsurface images) toward clustering, classification and learning. When data satisfy regularity properties, they often admit sparse or compressible representations in a judicious transformed domain: a few transformed coefficients provide accurate data approximation. Such representations, like multiscale or wavelet transforms, are beneficial to subsequent processing, and they form the core of novel data processing methodologies, such as Scattering networks/transforms (SN) or Functional Data Analysis (FDA). Due to the variety of such transforms, without prior knowledge, it is not evident to find the most suitable representation for a given set of data. The aim of this subject is to investigate potential relations between transform properties and data compressibility on the one hand, and classification/clustering performance on the other hand, especially with respect to the robustness to shifts/translations or noise in data features, with matters in experimental applications. Rooting on a recent work, the first objective is to develop a framework to allow the use of different sparsifying transformations (bases or frames of wavelets and multiscale transformations) at the input of reference SN algorithms. This will permit to evaluate the latter on a variety of experimental datasets, with the aim of choosing the most appropriate, both in terms of performance and usability, since the redundancy in transformations may hinder their application to large datasets. A particular interest could be laid on complex-like transformations, that may improve either the sparsification or ”invariance properties” in the transformed data. Their importance has been underlined recently for deep convolutional networks. Then, starting from real data, the trainee will develop realistic models reproducing the expected behaviors in the data, for instance related to shifts or noise. Finally, the relative clustering/classification performances will be assessed with respect to different trans- formation choices, and their impact on both realistic models and real data. A particular interest could be laid on either transform properties (redundancy, frame bounds, asymptotic properties) or the resulting data multiscale statistics.
(c) GdR 720 ISIS - CNRS - 2011-2018.