Federated learning for medical multi-center image classification
8 Novembre 2022
Catégorie : Stagiaire
The internship will take place in the Laboratory of Medical Information Processing (LaTIM- INSERM UMR 1101), in Brest. It will be conducted within the framework of the LabCOM ADMIRE, a research unit created by Evolucare Technologies and LaTIM (https://anr.fr/Projet-ANR-19-LCV2-0005).
The joint collaboration between LaTIM and Evolucare Technologies resulted in an automated algorithm that screens for ocular anomalies such as diabetic retinopathy, glaucoma and age-related macular degeneration based on fundus photographs. The automated algorithm, whose performance reaches that of a retinal expert, is currently commercialized by OphtAI (www.ophtai.com), which was created by Evolucare Technologies (Villers Bretonneux, France) and ADCIS. It has been deployed in several clinical centers around the world, through the Evolucare Technologies cloud. The success of this solution is partly thanks to the large amount of annotated data collected from the OPHDIAT screening network in Ile-De-France, namely 760,000 images from 100,000 diabetic patients. The goal of LaTIM and Evolucare Technologies is to expand screening to all pathologies affecting the eye, or visible through the eye (cardiovascular pathologies, neurodegenerative diseases, etc.). To this end, images coming from different populations (other than the diabetic population) are required for each of these conditions to be well represented by a sufficient number of examples. This is achievable if the data are collected from several clinical centers.
In such setting, traditional learning where an automated algorithm is developed from local data (data from a single source) is not suitable. This is mainly due to data privacy rules between different clinical centers: it is challenging to share data to build robust artificial intelligence models. To tackle this problem, federated learning  has shown great interest the last few years. In the LabCOM ADMIRE, it has been applied for the automatic screening of diabetic retinopathy using fundus photographs and has shown performances comparable with the performances of an AI trained on centralized data. In this work, we aim to evaluate the generality of the proposed federated learning algorithms to other applications.
Federated learning allows training a deep learning model from heterogeneous multi-sources, without centralizing data . The aim of this internship is to explore existing heterogeneous, multi-source medical datasets. These datasets include but are not limited to: the chest X-rays dataset , the ISIC2019 dataset (https://challenge.isic-archive.com/landing/2019/) which contains dermoscopy images , the Camelyon16 dataset (a histopathology dataset of breast biopsies’ slides)  and the COVID-19 Radiography Database (https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database) [6, 7]. In addition, a benchmark of several federated learning strategies on all these datasets will be studied . The selected candidate’ roles will include:
- Literature review of federated learning and of multi-center medical datasets
- Developing and evaluating federated learning strategies
- Participating in progress meetings
- Programming skills: Python, PyTorch.
- Knowledge of federated learning is a plus.
- Duration: 5-6 months
- Salary: 600 € per month
Send your resume, motivation letter and grades to Sarah Matta (email@example.com), Gwenolé Quellec (firstname.lastname@example.org) and Mathieu Lamard (email@example.com).
 Yi Liu et al. “A systematic literature review on federated learning: From a model quality perspective”. In: arXiv preprint arXiv:2012.01973 (2020).
 Jean Ogier du Terrail et al. “FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings”. In: arXiv preprint arXiv:2210.04620 (2022).
 Joseph Paul Cohen et al. “TorchXRayVision: A library of chest X-ray datasets and models”. In: arXiv preprint arXiv:2111.00595 (2021).
 Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions”. In: Scientific data 5.1 (2018), pp. 1–9.
 Geert Litjens et al. “1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset”. In: GigaScience 7.6 (2018), giy065.
 Muhammad EH Chowdhury et al. “Can AI help in screening viral and COVID-19 pneumonia?” In: IEEE Access 8 (2020), pp. 132665–132676.
 Tawsifur Rahman et al. “Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images”. In: Computers in biology and medicine 132 (2021), p. 104319