Les commentaires sont clos.

Deep learning with weak or few labels in medical image analysis

Date : 1-02-2022
Lieu : CNRS Délégation Ile-de-France Villejuif (+ VISIO)

Thèmes scientifiques :
  • B - Image et Vision
  • T - Apprentissage pour l'analyse du signal et des images

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

S'inscrire à la réunion.


114 personnes membres du GdR ISIS, et 71 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 200 personnes.


31/01/22 NEWS : Visio links

Sujet : Zoom meeting invitation - GDR ISIS - Morning session - 9h15-13h15
Heure : 1 févr. 2022 09:15 AM Paris
Participer à la réunion Zoom
ID de réunion : 995 3068 5690
Code secret : nGsL9B

Sujet : Zoom meeting invitation - GDR ISIS - Afternoon session - 13h45-17h30
Heure : 1 févr. 2022 01:45 PM Paris
Participer à la réunion Zoom
ID de réunion : 973 7458 3797
Code secret : pum39g

25/01/22 NEWS :

The program of our workshop has been finalized and we invite you to discover it below.
We are pleased to welcome a dozen speakers in person at the Villejuif site.
There are still some places available, so do not hesitate to come (especially for parisians). Please fill this survey if you plan to come.

The GDR ISIS workshops are an important moment for exchanges and networking between young and experienced scientists of our community. We welcome you to experience it again in the respect of the sanitary instructions!

We will do our best to provide a video link for those who cannot make the trip.

In recent years, artificial intelligence, especially deep learning, has received a lot of attention to explore and structure multidimensional and multimodality medical imaging data for a wide variety of tasks ranging from low to high level image processing and analysis such as segmentation, image synthesis, diagnosis and prognosis models characterizing pathological patterns in the data or predicting the course and outcome of diseases, respectively, as well as therapy monitoring. The amount of high-quality annotated data is critical for the training of deep learning networks; however in the medical field, annotations are known to be costly to acquire. Hence, many approaches have been set up to address the lack of labels, the small quantity of labels, noisy labels, based on training strategies, novel architectures or data generation etc.

This one-day workshop intends to gather researchers in deep machine learning, computer vision and/or medical image analysis as well as companies and AI-based startups in the medical image field, interested on how to mitigate the need for large amounts of annotated data in deep learning, for various medical image analysis tasks: image segmentation, registration, detection, image super-resolution, image synthesis, etc. We welcome contributions addressing, but not limited to, the following topics:

  • weakly supervised learning, self-supervised learning, semi-supervised learning, one-shot learning
  • transfer learning, domain adaptation, regularized training
  • data augmentation, synthetic data generation
  • attention based architectures, including transformers

Invited speakers:

  • Hervé Delingette DR INRIA Sophia- Antipolis, Asclepios Team
  • Hoel Kervadec Research Fellow - Erasmus MC Rotterdam

Call for participation

The day will include short presentations (20-30 min including questions). Interested participants should send the organizers an abstract (1/2 page) that includes the authors names and affiliations. Students are particularly encouraged to participate. Deadline for application is : 19-01-2022


  • Carole Lartizien(CREATIS Lyon), carole.lartizien@creatis.insa-lyon.fr
  • Caroline Petitjean (LITIS Rouen), caroline.petitjean@univ-rouen.fr
  • Nicolas Thome (CNAM Paris), nicolas.thome@lecnam.net
  • Mireille Garreau (LTSI Rennes), mireille.garreau@univ-rennes1.fr


CNRS Délégation Ile-de-France Villejuif - 7 Rue Guy Môquet, 94800 Villejuif


9h45 Introduction

Matinée 10h00-12h45

Conférence invitée (45 min) :

Hoel Kervadec - Erasmus MC Rotterdam - Beyond pixel-wise supervision: semantic segmentation with few shape descriptors

Exposés (45min):

1. Rosanna El Jurdi - A surprisingly effective perimeter-based loss for medical image segmentation

2. Gaspard Dussert - Learning to segment prostate cancer by aggressiveness from scribbles in bi-parametric MRI

Pause Café 11h30-11h45

Exposés (45 min)

3. Huaqian Wu - End-to-end neuron instance segmentation in preclinical data based on weakly supervised efficient UNet and morphological post-processing

4. Amina Ben Hamida - Deep learning for sparsely annotated colon cancer histopathological Images

5. Sk Imran Hossain - Early diagnosis of Lyme disease by recognizing Erythma Migrans skin lesions from images utilizing deep learning technique

Pause déjeuner 12h45-14h

Après-midi 14h00-17h00

Conférence invitée (45 min) :

Hervé Delingette - INRIA Sophia- Antipolis
Some Strategies to cope with the cost of annotations in Medical Image Analysis

Exposés (45 min) :

6. Vinkle Srivastav - Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room (OR)

7. Arnaud Boutillon- Multi-task, multi-domain deep segmentation with shared representations and contrastive regularization for sparse pediatric datasets

Pause 15h30-15h45

Exposés (60 min) :

8. Chinedu Nwoye - Rendezvous : Attention mechanisms for the recognition of surgical action triplets in endoscopic videos - Attention mechanism, weak supervision for object localisation, endoscopic video

9. Julian Tachella - Equivariant Imaging: Learning to solve imaging inverse problems without ground-truth -inverse problem - missing/sparse data

10. Arslan Janan - Breaking the cycle of data imbalance: the role of optimized loss functions

16h45 Conclusion

17h Fin de la journée

Résumés des contributions

Title : Beyond pixel-wise supervision: semantic segmentation with few shape descriptors

Hoel Kervadec - Erasmus MC Rotterdam

Abstract :

In the context of image semantic segmentation, neural networks are most often supervised with variants of standard losses such as cross-entropy or dice. While effective, as showed by the considerable progresses made over the past few years, this remains at its core pixel-wise classification, discarding the spatial and geometrical information altogether. We could say that those losses are 'micro-managing' each and every pixel, instead of supervising the 'big-picture': where the predicted object is, does it have the desired shape? There exists an extensive literature in pre-deep-learning computer vision to describe and characterize objects; a few descriptors can be sufficient to reconstruct complex shapes. It is already known that some can be used as regularizer while training deep-neural networks, to improve smoothness or remove spurious pixels for instance. But, as far as we are aware of, it has never been shown if shape descriptors could be powerful enough to supervise a neural network wholly on their own, without resorting to any pixel-wise supervision.

Not only interesting theoretically, there exist deeper motivations to pose segmentation problems as a reconstruction of shape descriptors: First, annotations to obtain approximations of low-order shape moments could be much less cumbersome than their full-mask counterparts, and anatomical priors could be readily encoded into invariant shape descriptions, which might alleviate the annotation burden. Finally, and most importantly, we hypothesize that, given a task, certain shape descriptions might be invariant across image acquisition protocols/modalities and subject populations, which might open interesting research avenues for generalization in medical image segmentation. This talk will first present recent works in constrained deep neural networks, that enable the use of general shape descriptors as supervision methods. Then, we introduce and re-formulate a few shape descriptors in the context of image segmentation, and evaluate their potential as stand-alone losses on two different, challenging tasks. Very surprisingly, as little as 4 descriptors per class can approach the performance of a segmentation mask with 65k individual discrete labels. We also found that shape descriptors can be a valid way to encode anatomical priors about the task, leveraing expert knowledge without additional annotations.

Title : Some Strategies to cope with the cost of annotations in Medical Image Analysis.

Hervé Delingette - INRIA Sophia- Antipolis

Abstract :

Image annotations such as image labels or organ delineations are required to train supervised learning algorithms to solve various tasks in medical image analysis but also to evaluate their performance. Producing high quality annotations is very time consuming especially when dealing with volumetric images. Furthermore, inter-rater variability when producing those annotations has to be taken into account to reflect the complexity of the tasks. In this lecture, I will present some strategies related to data and models to cope with the cost of annotations. A first set of approaches are data-centric and aim to keep only high quality annotations and to precisely measure the agreement or disagreement between the raters. A second set of methods focused on machine learning models try to minimize the amount of required strong annotations for instance through the use of semi-supervised or mixed-supervised techniques.


1) A Surprisingly Effective Perimeter-based Loss for Medical Image Segmentation

Rosana El Jurdi (1)(2), Caroline Petitjean (1), Paul Honeine (1), Veronika Cheplygina (4)(5), Fahed Abdallah (2)(3)
(1) Normandie Univ, INSA Rouen, UNIROUEN, UNIHAVRE, LITIS, Rouen, France (2) Université Libanaise, Hadath, Beyrouth, Liban (3) ICD, M2S, Université de technologie de Troyes, Troyes, France (4) Computer Science Department, IT University of Copenhagen, Denmark (5) Medical Image Analysis group, Eindhoven University of Technology, Eindhoven, The Netherlands

Deep convolutional networks recently made many breakthroughs in medical image segmentation. Still, some anatomical artefacts may be observed in the segmentation results, with holes or inaccuracies near the object boundaries. To address these issues, loss functions that incorporate constraints, such as spatial information or prior knowledge, have been introduced. An example of such prior losses are the contour-based losses, which exploit distance maps to conduct point-by-point optimization between ground-truth and predicted contours. However, such losses may be computationally expensive or susceptible to trivial local solutions and vanishing gradient problems. Moreover, they depend on distance maps which tend to underestimate the contour-to-contour distances. We propose a novel loss constraint that optimizes the perimeter length of the segmented object relative to the ground-truth segmentation. The novelty lies in computing the perimeter with a soft approximation of the contour of the probability map via specialized non-trainable layers in the network. Moreover, we optimize the mean squared error between the predicted perimeter length and ground-truth perimeter length. This soft optimization of contour boundaries allows the network to take into consideration border irregularities within organs while still being efficient. Our experiments on three public datasets (spleen, hippocampus and cardiac structures) show that the proposed method outperforms state-of-the-art boundary losses for both single and multi-organ segmentation.

2) Learning to segment prostate cancer by aggressiveness scribbles from in bi-parametric MRI

Audrey Duran, Gaspard Dussert, and Carole Lartizien
Univ Lyon, CNRS, Inserm, INSA Lyon, UCBL, CREATIS, UMR 5220, U1206, F-69621,Villeurbanne, France

In this work, we propose a weakly supervised U-Net based network that jointly segments and grades prostate cancers by Gleason Scores from scribble annotations drawn in the prostate and lesion in bi-parametric MRI. This model extends the size constraint loss proposed by Kervadec et al. [1] in the context of multiclass detection and segmentation tasks. Performance is assessed based on a private dataset (219 patients) where the full ground truth is available as well as on the ProstateX-2 challenge database, where only biopsy results at different localisations serve as reference. Regarding the automatic GS group grading on our private dataset, we report a lesion-wise Cohen's kappa score of 0.29±0.07 for the weak model by using only 6% of annotated voxels for training. This score is very close to the kappa score of 0.32±0.05 achieved with the fully-supervised U-Net model. We also report a kappa score of 0.28±0.04 on the ProstateX-2 dataset with our weakly supervised U-Net trained on a combination of ProstateX-2 and our dataset, which is the highest reported kappa on this challenge dataset for a segmentation task to our knowledge.
[1] Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I.:Constrained-CNN losses for weakly supervised segmentation. Medical Image Analysis. 54, 88?99 (May 2019)

3) End-to-end Neuron Instance Segmentation in Preclinical Data based on Weakly Supervised Efficient UNet and Morphological Post-processing

Huaqian Wu (1), Nicolas Souedet (1), Caroline Jan (1), Cédric Clouchoux (2), Thierry Delzescaux (1)
(1) CEA-CNRS-UMR 9199, MIRCen, Fontenay-aux-Roses, France, (2) WITSEE, Neoxia, Paris, France

Recent studies have demonstrated the superiority of deep learning in medical image analysis, especially in cell instance segmentation, a fundamental step for many biological researches. However, the good performance of the neural networks requires training on large unbiased dataset and annotations, which is extremely labor-intensive and expertise-demanding. Here, we propose an end-to-end weakly-supervised framework to automatically detect and segment NeuN stained neuronal cells on histological images. With only point annotations and binary segmentation of Random Forest, our mask synthesis pipeline is able to generate pixel level instance masks with good quality. The synthetic masks are then used as ground truth to train our neural network, which is an encoder-decoder U-Net-like architecture, integrating the state-of-the-art network, EfficientNet, as backbone. Validation results show the superiority of our model compared to other recent methods. In addition, we investigated multiple post-processing schemes and proposed an original mathematical morphology based strategy to convert the probability map into segmented instances using ultimate erosion and dynamic dilation. This approach is easy to configure and outperforms other classical post-processing techniques.

4) Deep Learning for Sparsely annotated colon cancer histopathological Images

Amina Ben Hamida (1)(2), Maxime Devanne (2), Jonathan Weber (2), Caroline Truntzer (3), Valentin, Derangère (3), François Ghiringhelli (3), Germain Forestier (2), and Cédric Wemmert (1)
(1) ICube, University of Strasbourg, France (2) IRIMAS, University of Haute-Alsace, France, (3) Platform of Transform in Biological Oncology, Dijon, France

Nowadays, a wide range of medical imaging techniques is adopted by pathologists and researchers. The usage of the Whole Slide images (WSI) in particular has significantly grown in the last few years. These images are digital copies of the classical tissue glass slides. In fact, the segmentation of such data is a key step for many related tasks namely disease diagnosis, surgical planning and nodule and lump detection. However, the characteristics of histopathological images (WSI) namely the gigapixel size, high resolution and the shortage of richly labeled samples have hindered the efficiency of available Machine learning approaches. Regarding the efficiency of Deep learning (DL) in large scale applications, we propose the use of different Deep models for colon cancer histopathological image segmentation in a weakly supervised scenario. First, we introduce a novel multi-step training strategy to cope with the sparsely annotated data. Then, we propose the use of the Unet and SegNet models for accurate colon cancer histopathological image segmentation. Finally, novel enhanced models of the Attention-Unet are introduced where different schemes are proposed for the skip connections and spatial attention gates positions in the network. In fact, spatial attention gates assist the training process and enable the model to avoid irrelevant feature learning. Alternating the presence of such modules adds robustness and ensures better image segmentation results.

5) Early diagnosis of Lyme disease by recognizing Erythema Migrans skin lesion from images utilizing deep learning techniques

Sk Imran Hossain (1), Engelbert Mephu Nguifo (1), Jocelyn de Goër de Herve (2)
(1) Université Clermont Auvergne, CNRS, ENSMSE, LIMOS, F-63000 Clermont-Ferrand, France, (2) Université Clermont Auvergne, INRAE, VetAgro Sup, UMR EPIA, 63122 Saint-Genès-Champanelle, France

Lyme disease is one of the most common infectious vector-borne diseases in the world. We extensively studied the effectiveness of convolutional neural networks for identifying Lyme disease from images. Our research plan includes multimodal learning incorporating expert opinion elicitation, automation of skin hair mask generation and improving neural architecture search.

6) Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the OR

Vinkle Srivastav (1), Afshin Gangi (2), Nicolas Padoy (1,3)
(1) ICube, University of Strasbourg, CNRS, France, (2) Radiology Department, University Hospital of Strasbourg, France, (3) IHU Strasbourg, France

The fine-grained localization of clinicians in the operating room (OR) is a key component to design the new generation of OR support systems. Computer vision models for person pixel-based segmentation and body-keypoints detection are needed to better understand the clinical activities and the spatial layout of the OR. This is challenging, not only because OR images are very different from traditional vision datasets, but also because data and annotations are hard to collect and generate in the OR due to privacy concerns. To address these concerns, we first study how joint person pose estimation and instance segmentation can be performed on low resolutions images from 1x to 12x. Second, to address the domain shift and the lack of annotations, we propose a novel unsupervised domain adaptation method, called AdaptOR, to adapt a model from an in-the-wild labeled source domain to a statistically different unlabeled target domain. We propose to exploit explicit geometric constraints on the different augmentations of the unlabeled target domain image to generate accurate pseudo labels, and using these pseudo labels to train the model on high- and low-resolution OR images in a self-training framework. Furthermore, we propose disentangled feature normalization to handle the statistically different source and target domain data. Extensive experimental results with detailed ablation studies on the two OR datasets MVOR+ and TUM-OR-test show the effectiveness of our approach against strongly constructed baselines, especially on the low-resolution privacy-preserving OR images. Finally, we show the generality of our method as a semi-supervised learning (SSL) method on the large-scale COCO dataset, where we achieve comparable results with as few as 1% of labeled supervision against a model trained with 100% labeled supervision.

7) Multi-task, multi-domain deep segmentation with shared representations and contrastive regularization for sparse pediatric datasets

Arnaud Boutillon (1)(2), Pierre-Henri Conze (1)(2), Christelle Pons (2)(3)(4), Valérie Burdin (1)(2), Bhushan Borotikar (2)(3)(5)
(1) IMT Atlantique, Brest, France (2) LaTIM UMR 1101, Inserm, Brest, France (3) Centre Hospitalier Régional et Universitaire (CHRU) de Brest, Brest, France (4) Fondation ILDYS, Brest, France (5) SCMIA, Symbiosis International University, Pune, India

Segmentation of the pediatric musculoskeletal system serves as an essential pre-processing step to guide clinical decisions, as the generated 3D models of muscles and bones help clinicians to evaluate pathology progression and optimally plan therapeutic interventions. However, the accuracy and generalization performance of automatic segmentation models trained on individual domains are limited due to the restricted amount of annotated pediatric data. To address this issue, we developed and optimized a segmentation model on multiple datasets, arising from different parts of the anatomy, in a multi-task and multi-domain learning framework. This approach allows to overcome the inherent scarcity of pediatric data while benefiting from a more robust shared representation. The proposed segmentation network comprises shared convolutional filters, domain-specific batch normalization parameters that compute the respective dataset statistics and a domain-specific segmentation layer. Furthermore, a supervised contrastive regularization is integrated to further improve generalization capabilities, by promoting intra-domain similarity and forcing inter-domain margins in latent space. We evaluated our contributions on two sparse, unpaired (from different patient cohorts), and heterogeneous pediatric musculoskeletal datasets of the ankle and shoulder joints. We obtained promising results for the task of delineating pediatric bones from magnetic resonance (MR) images.

8) Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos

Chinedu Innocent Nwoye (1), Tong Yu (1), Cristians Gonzalez (2)(3), Barbara Seeliger (2)(3), Pietro Mascagni (1)(4), Didier Mutter (2)(3), Jacques Marescaux (5), Nicolas Padoy (1)(2)

(1) ICube, University of Strasbourg, CNRS, France, (2) IHU Strasbourg, France, (3) University Hospital of Strasbourg, France, (4) Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy, (5) IRCAD France

With the advent of minimally invasive surgery (MIS), large data are easily acquired intraoperatively during surgery. Analysis of these workflow data using deep learning methods helps to extract knowledge that are helpful for providing real-time decision support systems in the operating room (OR). Out of all existing frameworks for surgical workflow analysis in endoscopic videos, action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities. This information, presented (instrument, verb, target) combinations, is highly challenging to be accurately identified. Triplet components can be difficult to recognize individually; in this task, it requires not only performing recognition simultaneously for all three triplet components, but also correctly establishing the data association between them. To achieve this task, we introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging deep learning attention mechanisms at two different levels. We first introduce a new form of spatial attention to capture individual action triplet components in a scene; called Class Activation Guided Attention Mechanism (CAGAM). This technique focuses on the recognition of verbs and targets using activations resulting from instruments. Meanwhile, we learn the instrument localization by weak supervision using binary presence labels only. To solve the association problem, our RDV model adds a new form of semantic attention inspired by Transformer networks called Multi-Head of Mixed Attention (MHMA). This technique uses several cross and self attentions to effectively capture relationships between instruments, verbs, and targets. We also introduce CholecT50 - a dataset of 50 endoscopic videos in which every frame has been annotated with labels from 100 triplet classes. Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.

9) Equivariant Imaging: Learning to solve imaging inverse problems without ground-truth

Julian Tachella (1), Dongdong Chen (2) and Mike Davies (2)
(1) Physics laboratory, CNRS & ENS de Lyon, Lyon, France (2) School of Engineering, University of Edinburgh, Edinburgh, UK

In recent years, deep neural networks have obtained state-of-the-art performance in multiple medical imaging inverse problems ranging from computed tomography to magnetic resonance imaging. Networks are generally trained with pairs of images and associated measurements. However, in various imaging problems, we usually only have access to compressed measurements of the underlying images, thus hindering this learning-based approach. Learning from measurement data only is impossible in general, as the compressed observations do not contain information outside the range of the forward sensing operator. The new learning framework, called Equivariant Imaging[1, 2], overcomes this limitation by exploiting the invariance to transformations (translations, rotations, etc.) present in natural images. This learning strategy performs as well as fully supervised methods and can handle noisy data. The potential of this unsupervised method has been demonstrated on various inverse problems, including sparse-view X-ray
computed tomography and accelerated magnetic resonance imaging.
[1] Chen, Tachella and Davies (2021) Equivariant Imaging: Learning Beyond the Range Space, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Chen*, Tachella* and Davies (2021) Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements, arxiv. *equal contribution.

10) Breaking the cycle of data imbalance: the role of optimized loss functions

Janan Arslan (1)(2)
(1) Paris Brain Institute (Institut du Cerveau - ICM), UMR 7225, 75013, Paris, France (2) Institut National de la Santé Et de la Recherche Médicale, U 1127, 75013, Paris, France.

The utilization of artificial intelligence (AI) in healthcare has several benefits attached to it. When AI is applied to diagnostic imaging, it can discern image features and colors at greater bandwidth and with greater resolution than humans due to digital image processing, often discriminating features not visible to human vision. Outsourcing repetitive and time-consuming components of diagnostics to an AI-based machine or software frees up the clinician to focus on the more holistic side of medicine, including but not limited to clinical management, integrative medicine, and emotional support. AI can also quickly, and cost effectively, process megabytes of data. However, there are circumstances in which megabytes or high-quality data may not be readily available. For example, in the event of rare diseases, we may only have a handful of cases at our fingertips. There may be other circumstances in which labels and annotations required to run AI-based methods are scarce. Whole slide images are an excellent illustration of this, as the images are often large in nature, making it a challenge for pathologists to easily annotate all regions of interest; particularly small and not easily visible structures, such as blood vessels. These extreme cases can lead to severe data imbalances that often slow down the process of achieving high-performing AI models. Several pathways to overcoming such obstacles are actively being explored, including image augmentation and synthesis. Other potential solutions lie in mathematical statistics, particularly with the optimization of loss functions. Loss functions can break or make the performance of an AI-based model. For example, in the case of semantic segmentation, one loss function may generate clean and highly accurate segmentation outputs, while another can generate blurry outputs with no discernible borders. These instances can occur even when the models are trained under the same conditions and use the same hyperparameters. In this presentation, the application of several loss functions is examined in detail. The theory behind these loss functions is explored, followed by discussions on how the application of loss functions can improve situations in which data imbalances are a reality. The presentation will act as a quick guide on loss functions and conclude with future directions on how mathematical statistics can be explored to optimize our current AI learning processes.