Réunion

Les commentaires sont clos.

Vers un apprentissage pragmatique dans un contexte de données visuelles labellisées limitées

Date : 26-11-2021
Lieu : MSH Paris Nord (Salle panoramique, 20 avenue George Sand, 93210 La Plaine Saint Denis)

Thèmes scientifiques :
  • B - Image et Vision

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.


S'inscrire à la réunion.

Inscriptions

38 personnes membres du GdR ISIS, et 13 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 50 personnes.

Annonce

Résumé

Les réseaux de neurones convolutifs profonds ont offert de nombreux résultats impressionnants par exemple sur des tâches de classification d'images telles que la reconnaissance d'objets ou la classification de scènes. Cependant, pour qu'un réseau de neurones convolutifs profonds puisse avec succès apprendre à reconnaître des catégories visuelles, il est nécessaire de collecter et d'étiqueter manuellement des milliers d'exemples d'entraînement par catégorie cible et d'appliquer sur ceux-ci des algorithmes d'optimisation (en général itératifs) extrêmement coûteux en termes de ressources de calcul (centaines voire des milliers d'heures de GPU). De plus, dans le cas où une extension du modèle à d'autres catégories serait envisagée, il est nécessaire de collecter suffisamment de nouvelles données d'entraînement pour les nouvelles catégories et de recommencer la procédure d'apprentissage. Ces lourdes exigences ont poussé certains chercheurs à investiguer des pistes de recherche moins gourmandes en termes de données d'entrainement. L'idée étant de s'inspirer de la puissance du système visuel humain qui apprend de nouveaux concepts sans effort à partir de seulement quelques exemples et de les reconnaitre de manière fiable. Reproduire le même mode de fonctionnement dans les systèmes de vision artificielle par apprentissage est un des objectifs des recherches actuelles pour de nombreuses applications notamment celle de la vision du monde réel.

Cette journée a pour objectif de donner un état des lieux des avancées en vision artificielle par apprentissage automatique avec très peu de données labellisées sur les thèmes suivants : Détection, Reconnaissance, Classification, Segmentation.

Appel à contribution

Un appel à contribution est lancé sur les thèmes listés ci-dessus. Les chercheurs souhaitant présenter leurs contributions sont invités à soumettre aux organisateurs un résumé (1 page) avant le 30 octobre.

Orateurs invités

  • Dawood AL CHANTI, GIPSA-lab / Grenoble
  • Hervé LE BORGNE, CEA LIST, Saclay
  • Désiré SIDIBE, IBISC Université Val d'Essonne
  • Tuan-Hung VU, VALEO
  • Zihao WANG, INRIA, Nice

Organisateurs

Programme

9h00 : Accueil et présentation de la journée
Par Anissa MOKRAOUI, Mustapha LEBBAH, Hanene AZZAG
9h20-10h00 : IFSS-Net: Interactive Few-Shot Siamese Network for Faster Muscle Segmentation and Propagation in Volumetric Ultrasound
Dawood AL CHANTI, GIPSA-lab / Grenoble
10h00-10h40 : Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation
Désiré SIDIBE, IBISC Université Evry Val d?Essonne
10h40-11h00 : Pause café
11h00-11h20 : Impact of base dataset design on few-shot image classification.
Othman SBAI, PhD Student, Ecole des Ponts, École des Ponts - MVA ENS Paris Saclay
11h20-12h00 : Handling new target classes in semantic segmentation
Tuan-Hung VU, VALEO
12h00-12h20 : Improving Few-Shot Learning through Multi-task Representation Learning Theory
Quentin BOUNIOT, PhD Student, CEA LIST, Saclay
12h20-14h00 : Pause déjeuner
14h-14h40 : One shot Learning Landmarks Detection
Zihao WANG, INRIA, Nice
14h40-15h00 : Spatial Contrastive Learning for Few-Shot Classification
Yassine OUALI, PhD, Université Paris-Saclay, CentraleSupélec, MICS
15h00-15h20 : Pause café
15h20-16h00 : Zero-shot Learning with Deep Neural Networks for Object Recognition
Hervé LE BORGNE, CEA LIST, Saclay
16h00-16h20 : Prototypical Faster R-CNN for few-shot object detection on aerial images
Pierre LE JEUNE, PhD Student, COSE, Université Sorbonne Paris Nord, L2TI/LIPN
16h20-16h30 : Clôture de la journée

Résumés des contributions

IFSS-Net: Interactive Few-Shot Siamese Network for Faster Muscle Segmentation and Propagation in Volumetric Ultrasound
Dawood AL CHANTI, GIPSA-lab / Grenoble

Quantification of muscle volume is a useful biomarker for degenerative neuromuscular disease progression or sports performance. Measuring muscle volume often requires the segmentation of 3D images. While Magnetic Resonance (MR) is the modality of preference for imaging muscles, 3D Ultrasound (US) offers a real-time, inexpensive, and portable alternative. The motivation of our work is to assist the segmentation and volume computation of the low limb muscles from 3D freehand ultrasound volumes. We propose a novel deep learning segmentation and propagation method for 3D US data, which requires few-shot expert annotated slices per 3D volume, on average 48 annotations out of 1400 slices, and leverages unannotated sub-volumes using sequential pseudo-labelling. To produce a fast and accurate muscle segmentation, suitable for reliable volume computation, we design a minimal interactive setting. In practice, we design a Siamese network to capture a common feature representation between ultrasound and mask sub-volumes. The reference can either come from an annotated part of the volume or from prior predictions. To guarantee the model convergence with limited annotated data, we propose a decremental learning strategy. We validate our approach for the segmentation, label propagation, and volume computation of the low-limb muscles, namely: the Gastrocnemius Medialis (GM), the Gastrocnemius Lateralis (GL), and the Soleus (SOL). We consider a dataset of 44 subjects. We demonstrate our method?s capability to learn from a few annotations under a simulated weaklysupervised regime, keeping only 3,5% of the annotations.


Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation
Désiré SIDIBE, IBISC Université Evry Val d?Essonne

Few-shot semantic segmentation refers to the pixel-level prediction of new categories on a test set, given only a few label examples.
In this talk, we will highlight the challenges and specificities of this task as compared to other few-shot learning tasks such as image classification, and present the main approaches from the literature.
We will also present a method that takes advantage of complementary depth information to extend RGB-centric approaches.


Impact of base dataset design on few-shot image classification.
Othman SBAI, PhD, Ecole des Ponts, École des Ponts - MVA ENS Paris Saclay

The quality and generality of deep image features is crucially determined by the data they have been trained on, but little is known about this often overlooked effect. In this work, we systematically study the effect of variations in the training data by evaluating deep features trained on different image sets in a few-shot classification setting. The experimental protocol we define allows to explore key practical questions. What is the influence of the similarity between base and test classes? Given a fixed annotation budget, what is the optimal trade-off between the number of images per class and the number of classes? Given a fixed dataset, can features be improved by splitting or combining different classes? Should simple or diverse classes be annotated? In a wide range of experiments, we provide clear answers to these questions on the miniImageNet, ImageNet and CUB-200 datasets. We also show how the base dataset design can improve performance in few-shot classification more drastically than replacing a simple baseline by an advanced state of the art algorithm.


Handling new target classes in semantic segmentation
Tuan-Hung VU, VALEO

Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this work, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. We present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time -- so called generalized zero-shot classification. We further define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Aiming at explicit test-time prediction for these new classes, we propose a framework, BudaNet, that leverages domain adaptation and zero-shot learning techniques to enable ?boundless? adaptation in the target domain.This is a joint work with Maxime Bucher, Matthieu Cord and Patrick Pérez.


Improving Few-Shot Learning through Multi-task Representation Learning Theory
Quentin BOUNIOT, PhD Student, CEA LIST, Saclay

We consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the generalization capacity of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on classic few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification


One shot Learning Landmarks Detection
Zihao WANG, INRIA, Nice

Landmarks detection in a medical image is a mainstay for many clinical algorithms applications. Learning-based landmarks detection is now a major successful methodology for many types of objects detection. However, learning-based approaches usually need a number of the annotated dataset for training the learning models. In this presentation, I will introduce an automatic one-shot learning-based landmarks detection approach is proposed for identifying the landmarks in 3D volume images. A convolutional neural network-based iterative objects localization method in combine with a registration framework is applied for automatically target organ localization and landmarks matching. The evaluation result shows that the proposed method is robust in convergence, effective in accuracy, and reliable for clinical usage.


Spatial Contrastive Learning for Few-Shot Classification
Yassine OUALI, PhD, Université Paris-Saclay, CentraleSupélec, MICS

In this work, we explore contrastive learning for few-shot classification, in which we propose to use it as an additional auxiliary training objective acting as a data-dependent regularizer to promote more general and transferable features. In particular, we present a novel attention-based spatial contrastive objective to learn locally discriminative and class-agnostic features. As a result, our approach overcomes some of the limitations of the cross-entropy loss, such as its excessive discrimination towards seen classes, which reduces the transferability of features to unseen classes. With extensive experiments, we show that the proposed method outperforms state-of-the-art approaches, confirming the importance of learning good and transferable embeddings for few-shot learning.


Zero-shot Learning with Deep Neural Networks for Object Recognition
Hervé LE BORGNE, CEA LIST, Saclay

Zero-shot learning (ZSL) deals with the ability to recognize objects without any visual training sample. To counterbalance this lack of visual data, each class to recognize is associated with a semantic prototype that reflects the essential features of the object. The general approach is to learn a mapping from visual data to semantic prototypes, then use it at inference to classify visual samples from the class prototypes only. Different settings of this general configuration can be considered depending on the use case of interest, in particular whether one only wants to classify objects that have not been employed to learn the mapping or whether one can use unlabelled visual examples to learn the mapping.
We present an overview of the ZSL domain, its principle, how it has evolved over the past 10 years towards a less biased and a more realistic evaluation, as well as the main approaches proposed in the field. Then, we present quite recent work addressing various aspect of the ZSL, namely the realism of the "generalized zero shot learning", different methods to build the class prototypes at large scale, and a method to benefit from unlabelled samples from the unseen classes.


Prototypical Faster R-CNN for few-shot object detection on aerial images
Pierre LE JEUNE, PhD, COSE, Université Sorbonne Paris Nord, L2TI/LIPN

Few-shot object detection is a challenging task in computer vision, it requires to detect novel classes based only on few annotated examples. In this talk, we introduce a new method based on Faster R-CNN and representation learning: Prototypical Faster R-CNN. The main idea is to combine the adaptability of prototypical networks with the high performance of Faster R-CNN on detection tasks. Each region of interest is mapped into an embedding space and classification of these regions is done comparing the distance between their representation and class prototypes. This approach is limited by the quality of the prototypes and intra-class variance is hard to overcome this way. An attention mechanism between the query image and classes prototypes is a possible solution. It can filter out irrelevant information (e.g. background related information) contained in RoI or in class prototypes in order to improve the matching.