Annonce

Les commentaires sont clos.

Perceptually Based Frugal Models for Low-Carbon AI

13 December 2022


Catégorie : Doctorant


Artificial Intelligence (AI) tools have become ubiquitous in today society. At the same time, the impact of AI on the environment has become non-negligible because of its carbon footprint. A frugal, rather than data hungry, AI can improve efficiency, thus addressing a significant challenge given the widespread use of machine learning. A less data-intensive algorithm would consume less energy, but the search for frugality goes even further. This PhD will address this issue via the development of perceptually based models. Like the human visual system, these models are specifically sensitive to perceptually pertinent features, such as textures, object contours, and their spatial arrangements.

Visual processing of simple image elements (such as lines and edges) does not happen inside a cognitive vacuum: it differs when those simple elements are embedded within natural scenes that look more like what we see every day, as opposed to the featureless backgrounds that are normally used in the laboratory. We know a good amount about the mechanisms that support vision in a simple setup (i.e. involving a simple stimulus with no natural meaningful content). We know virtually nothing about how those mechanisms may change and/or be augmented/replaced by new mechanisms under conditions that are closer to natural vision (i.e. when the image starts making sense and contains recognizable objects). In this thesis we will study how visual primitives (lines, edges, junctions) interact and how their spatial relations could be used by an AI model to efficiently use the semantic information in the image to recognize objects and scenes.

 

Contact: petr.dokladal@mines-paristech.fr

 

General Context and Challenges

Artificial Intelligence (AI) tools have become ubiquitous in today society. At the same time, the impact of AI on the environment has become non-negligible because of its carbon footprint. A frugal, rather than data hungry, AI can improve efficiency, thus addressing a significant challenge given the widespread use of machine learning. A less data-intensive algorithm would consume less energy, but the search for frugality goes even further.

Project description

We address the above flaw via the development of perceptually based models. Like the human visual system, these models are specifically sensitive to perceptually pertinent features, such as textures, object contours, and their spatial arrangements. Research in perception has a long history. The links between perception and image processing started with detecting perceptually meaningful events. According to a long-standing principle in sensory processing, every large image deviation from “uniform noise” should be perceptible, provided this large deviation corresponds to an a priori fixed list of geometric structures (lines, curves, closed curves, convex sets, spots, local groups). Desolneux, et al. [1] explored the connection between this principle and image processing in a probabilistic setting for the detection of perceptual contours in natural images. A link between this probabilistic approach and Mathematical Morphology has been proposed by Dokladal [2] to detect cracks in materials.

AI models too can be constrained to be sensitive to perceptually significant primitives, like lines or edges. This notion, however, runs counter to mainstream views that constrained models cannot match the performance of unconstrained models. However, at the negligible cost of small score reductions, one can obtain interesting properties when a model is tuned/constrained to target some desirable function. For example, the incorporation of modules inspired by biology has been shown to confer robustness to deep networks [3]. Further attempts at constraining AI models to perceptually significant features, developed in an effort to obtain invariance to rotation [4][5], produced interesting results in terms of 1) size of the model and 2) computational requirements.

Visual processing of simple image elements (such as lines and edges) does not happen inside a cognitive vacuum: it may differ when those simple elements are embedded within natural scenes that look more like what we see every day, as opposed to the featureless backgrounds that are normally used in the laboratory. We know a good amount about the mechanisms that support vision in a simple setup (i.e. involving a simple stimulus with no natural meaningful content). We know virtually nothing about how those mechanisms may change and/or be augmented/replaced by new mechanisms under conditions that are closer to natural vision (i.e. when the image starts making sense and contains recognizable objects). In this thesis we will study how visual primitives (lines, edges, junctions) interact and how their spatial relations could be used by an AI model to efficiently use the semantic information [6] in the image to recognize objects and scenes.

The features sensitive to these primitives will be fitted to data via learning. A promising tool for efficient encoding of spatial arrangements and relations are graph convolutional networks (GCN), introduced by Bruna et al. [7] and developed later by Kipf and Welling [8] to architecture that later became known as GCN. Since [8], the graph topology understanding remained on the level of immediate neighbors until Zhu et al. [9] proposed the H2GCN to encode a high-order network information from middle layers, and Qian et al. [10] explored that the performance of GCNs is related to the alignment among features, graph, and ground truth. Recently Wang et al. [11] proposed to integrate the graph motif-structure information into the convolution operation of each layer.

The potential benefits of encoding the geometry and topology of perceptually significant primitives from image into a graph are significant. Indeed, a perceptually based model will not only be more efficient in terms of computational requirements, but it will also become data frugal, faster to train, and more robust to adversarial attacks. Such models will pave the way for a sustainable future with energy efficient, environment friendly AI.

This PhD is a new, emerging collaboration between two PSL institutions: the MINES Paris PSL, and the ENS PSL. These laboratories involved in this PhD support complementary expertise in artificial intelligence and visual perception.

Acknowledgment: The funding is provided by the TTI.5 Transition Institute

PhD Advisors :

DOKLADAL Petr, Domain : Intelligence Artificielle / Informatique : CMM / Mathématique et Systèmes / MINES Paris PSL
NERI Peter, Domain : Perception / Sciences cognitives : Laboratoire des systèmes perceptifs / Département d’études cognitives / ENS PSL

Doctoral School :

ISMEE Mathématiques et Systèmes, École doctorale 621, Ingénierie des Systèmes, Matériaux, Mécanique, Énergétique

Application condition: finished M2 programme, AI coding skills (tensorflow, pytorch), with excellent academic record

Application procedure: a prospective candidate should send his curriculum, academic track, motivation letter, two reference letters to : petr.dokladal@minesparis.psl.eu

References :

  1. A. Desolneux, L. Moisan and J.-M. Morel, Edge Detection by Helmholtz Principle, Journal of Mathematical Imaging and Vision 14: 271–284, 2001

  2. P. Dokladal, https://hal-mines-paristech.archives-ouvertes.fr/hal-01478089/document

  3. J. Dapello, T. Marques, M. Schrimpf, F. Geiger, D. Cox and J. Dicarlo, Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations, 2020, DOI: 10.1101/2020.06.16.154542.

  4. R. Rodriguez Salas, P. Dokládal and E. Dokladalova, Rotation Invariant Networks for Image
    Classification for HPC and Embedded Systems, Electronics, 2021, DOI:
    https://doi.org/10.3390/electronics10020139

  5. R. Rodriguez Salas, P. Dokládal and E. Dokladalova, A minimal model for classification of
    rotated objects with prediction of the angle of rotation, J. of Visual Communication and Image
    Representation, 2021, https://doi.org/10.1016/j.jvcir.2021.103054

  6. P. Neri, Semantic control of feature extraction from natural scenes. Journal of Neuroscience, 34, 2374-2388, 2014

  7. Bruna, J., Zaremba, W., Szlam, A. & LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:​ 1312.​6203 (2013).

  8. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR) (2017).

  9. Zhu, J. et al. Beyond homophily in graph neural networks: Current limitations and effective designs. Adv. Neural Inf. Process. Syst. 33, 7793–7804 (2020).

  10. Qian, Y., Expert, P., Rieu, T., Panzarasa, P. & Barahona, M. Quantifying the alignment of graph and features in deep learning. IEEE Transactions on Neural Networks and Learning Systems (2021).

  11. B. Wang, L. Cheng, J. Sheng, Z. Hou and Y. Chang, Graph convolutional networks fusing motif-structure information. Scientific Reports, 2022, vol. 12, no 1, p. 1-12. https://doi.org/10.1038/s41598-022-13277-z