Les commentaires sont clos.

[Stage MASTER 2] Expert-guided Deep-Attention Saliency Models for Computer Vision

21 December 2021

Catégorie : Stagiaire

Encadrement : Christophe Montagne / Dominique Fourer

Laboratoire : IBISC (EA 4526), Univ. Evry / Paris-Saclay

Contact : /

Pour candidater, envoyer CV, LM et notes disponibles de M1 et M2



computer vision, image segmentation, attention-based model, MRI, autonomous driving, deep learning


Attention is a promising deep learning paradigm to develop efficient and almost interpretable computer vision methods. This is a hot topic, as shown by an increasing number of related work in the literature. The present work aims at further investigating the attention model under the assumption that a Human expert can convey relevant information to enhance the accuracy of the trained machine learning model. Hence, we aim at developing novel deep neural architectures including a deep-attention mechanism that is directly trained using visual information generated by human experts involved in visual segmentation task in real-world scenarios (e.g. eye-tracking coordinates and/or ocular measurements). The considered applications in this study will possibly include autonomous driving and precision medicine tasks for which we aim at showing how the resulting deep-saliency function guided by expert human can enhance the efficiency of an existing deep-learning-based prediction method (e.g. MRI segmentation, image recognition, etc.). To reach this goal, the recruited internship candidate will first implement a state-of-the-art attention-based deep learning based method applied to a given segmentation task. Second, we will modify the preliminary proposed method to separately train the attention block using a distinct training dataset. Finally, we will assess the different suggested deep learning methods in different scenarios to validate our assumption that a saliency function trained by experts is more efficient.


— Bibliographical study for identifying the best state-of-the-art methods for attention-based deep neural network architectures

— Implementation of one or several state-of-the-art methods for the sake of reproducible research

— Construction of a new expert-based dataset made of eye-tracking and ocular recordings in collaboration with the partners of the project

— Development and evaluation of one or several new attention-based deep neural network architectures considering several perception-based toy problems in autonomous driving and/or medical image segmentation


The starting point of this research is our previous work focusing on deep learning-based methods applied to MRI segmentation, in which we compared several neural architectures [1]. In this research work, we showed that the use of a bounding box as a preprocessing step can significantly improve our MRI segmentation results. We also showed that the use of cascade architecture where the combination of 3 distinct cascaded deep neural networks can obtain promising results despite a very high computational cost. Hence, the recruited internship candidate will also investigate and compare new promising attention-based deep neural architecture such as U-net transformer with self- and cross-attention [4].

Following this idea which was recently proposed in reinforcement learning driving scenarios [3], we now propose to develop new attention-based deep neural architectures where the attention block is trained using a distinct dataset constructed by measurements collected from experts. The future proposed method will involve a deep neural architecture including two distinct convolutional neural networks (CNNs) as illustrated in Fig. 1 where the “What” CNN provides a prediction using a preprocessed version of the input preprocessed by the “Where” CNN computing the prominent regions in an image.

The difference of the new proposed method is that, we will now transfer information from a distinct training dataset dedicated to the deep-attention “Where” CNN which is designed to provide a saliency map guided by experts involved in the construction of this dual saliency dataset. The goal is to obtain a generalized saliency function targeting the regions of interest in an image to enhance the efficiency of a given machine-learning-based prediction model.

As a preliminary step, we expect to conduce a bibliographical study to discuss advantages and trade-off of the most promising state-of-the-art attention-based deep neural network architectures designed for salient object detection and image segmentation tasks [6, 2]. A second part of this research work is dedicated to the design of computer vision experiments and the collection of eye-tracking data from Human experts involved in image recognition tasks in different scenarios (driving or biomedicine). This novel saliency dataset will be used to train the new proposed deep-attention mechanism combined with the chosen baseline computer vision method.

Finally, we expect to objectively evaluate several proposed methods including or not the new proposed expert-guided deep-attention model by considering several application scenarios designed for autonomous driving and/or biomedical images segmentation.

Required profile

— good machine learning and signal processing knowledge

— mathematical understanding of the formal background

— excellent programming skills (Python, Matlab, C/C++, keras, tensorflow, pytorch, etc.)

— good motivation, high productivity and methodical works

Salary and perspectives

According to background and experience (a minimum of 577.50 euros/month). Possibility to pursue with a 3-year-funded PhD contract[1].

[1] .