Annonce

Les commentaires sont clos.

Cifre PhD position IDEMIA+ENSEA: Federated learning with noisy clients

12 Avril 2023


Catégorie : Doctorant


Two open PhD positions (Cifre) in the exciting field of federated learning (FL) are opened in a newly-formed joint IDEMIA and ENSEA research team working on machine learning and computer vision. We are seeking highly motivated candidates to develop robust FL algorithms that can tackle the challenging issues of data heterogeneity and noisy labels. The successful candidates will work towards making FL a more practical and efficient solution for real-world applications, with a particular focus on face recognition and related areas. As a PhD candidate in our group, you will have the opportunity to be involved in cutting-edge research and contribute to the development of novel algorithms that can have a significant impact on society. You will benefit from the expertise of our team as well as access to state-of-the-art resources and facilities.

 

Topic

Due to the increasing amount of data and the need for privacy preservation, the indiscriminate transmission and aggregation of data are no longer viable due to the high cost of bandwidth and the risk of privacy breaches. As a solution, a new approach called Federated Learning (FL) [1,3] has emerged to replace the traditional centralized learning paradigm and ensure data privacy. It has been successfully applied in different real-world tasks, such as health care [2] and smart city [3].

One of the remaining challenges in FL is the data heterogeneity in terms of data quality across different clients. In real-world scenarios, personalized requirements for each client may lead to independent model design, and client data is often unbalanced and weakly labeled. In practice, only a small portion of client data samples are labeled, and the label quality cannot be guaranteed due to difficulties in annotation, lack of user expertise, or user mistakes. As a result, the local client data frequently contains unavoidable and varying levels of noise.

Devising robust learning schemes in the presence of noisy labels is a vibrant research area, particularly in centralized settings [4,5]. Many centralized algorithms have been proposed to filter noisy samples or leverage regularization techniques to mitigate the impact of label noise. However, such techniques may not be feasible in the context of FL due to the discrepancy of local client data. Hence, developing FL algorithms that can handle data heterogeneity and noisy labels is an ongoing research challenge [6,7].

Our project aims to develop FL algorithms that address the issue of noisy labels at both the local and global levels, across all stages of the FL training process - i.e., FL initialization, local client model training, and server model aggregation. By combining novel techniques for noise estimation and correction, robust loss functions, meta knowledge alignment, and robust aggregation, we aim to achieve state-of-the-art performance in FL settings.

In the FL initialization phase, we intend to propose a novel approach that estimates and corrects noise levels on a per-client basis. We consider adopting a teacher-student mechanism to transform the problem of learning with noisy labels into a semi-supervised learning task [8], where we treat reliable samples as labeled data and unreliable samples as unlabeled data. To achieve this, we first train a teacher on a proxy clean dataset and then use the teacher to predict the dataset under question. We then employ a clustering method based on metrics such as intra-cluster and inter-cluster similarity distribution to identify the most reliable samples. We train a student on the cleaned data of the teacher and repeat this process until a sufficient number of reliable samples or a desired confidence score is reached.

During the training phase of the local models, we aim to develop robust loss functions, such as curriculum loss (CL) [9] or active passive loss (APL) [10], which have been shown to be efficient for handling noisy labels. As each client may have a different level of noise in their local data, we also plan to explore the knowledge about noise levels at clients to train a unified model that is more robust to label noise. This knowledge can be used to align the confidence score of label quality across different clients while ensuring privacy protection [11,12]. Additionally, we plan to use the local intrinsic dimension (LID) or other measures to measure the dimensionalities of the model prediction subspaces on each client to identify mislabeled data, as in [13]. Oncethe noise alignment is established, we can further improve the noise correction at the client level. These latter ideas can also be applied at the aggregation phase.

 

Requirements

  • Strong background in computer science, mathematics, or a related field.
  • Candidates with experience in machine learning, computer vision, and/or signal processing are especially encouraged to apply.
  • Proficiency programming skills (Python, Tensorflow, Pytorch)

 

Further information

  • If candidates wish, an internship or a fixed-term contract (CDD) can be considered while waiting for the PhD position to start in September/October 2023.
  • The position will be based on the Cergy (Ensea) and/or Courbevoie (Idemia) sites.
  • Competitive salary
  • Advisors: Ngoc-Son Vu, Damien Monet, Stephane Gentric, Aymeric Histace
  • How to apply: please send your CV, motivation letter, two reference letters, and grades (in English or French) to Ngoc-Son Vu (son.vu@ensea.fr), and/or via https://careers.idemia.com/job-invite/76530/

 

IDEMIA leads global identity and security, empowering you to assert your identity in a safe and simple future. Our world-class products serve finance, telecom, retail, government, and more. We use cutting-edge technology to deliver top-quality services to agencies and tech companies, impacting citizens and nations worldwide.

ETIS is a joint research department of ENSEA, CY Cergy Paris University, and CNRS. Represented by ENSEA, a top French electrical engineering and computing science graduate school, ETIS employs over 150 researchers and contributes to numerous EU and French-funded projects in AI and machine learning.

 

References

[1] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. “Communication-efficient learning of deep networks from decentralized data”, ICML 2017

[2] Quande Liu, Cheng Chen, Jing Qin, Qi Dou, and Pheng-Ann Heng. “Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space”, CVPR 2021.

[3] https://github.com/innovation-cat/Awesome-Federated-Machine-Learning

[4] H. Song et al., “Learning from Noisy Labels with Deep Neural Networks: A Survey”, IEEE Trans. on NN and Learning Systems 2022

[5] https://github.com/songhwanjun/Awesome-Noisy-Labels

[6] Xiuwen Fang1, Mang Ye, “Robust Federated Learning with Noisy and Heterogeneous Clients”, CVPR 2022

[7] H. Yang, P. Qiu, and J. Liu, “Taming Fat-Tailed (“Heavier-Tailed” with Potentially Infinite Variance) Noise in Federated Learning”, NeuRIPS 2022

[8] Junnan Li, Richard Socher, Steven C.H. Hoi, “DivideMix: Learning with Noisy Labels as Semi-supervised Learning”, ICLR 2020

[9] Y. Lyu, I. W. Tsang, “Curriculum loss: Robust learning and generalization against label corruption,” in Proc. ICLR, 2020

[10] X. Ma, H. Huang, Y. Wang, S. Romano, S. Erfani, J. Bailey, “Normalized loss functions for deep learning with noisy labels,” in Proc. ICML, 2020, pp. 6543–6553

[11] Q. Meng, F. Zhou, H. Ren, T. Feng, G. Liu, Y. Lin Improving Federated Learning Face Recognition via Privacy-Agnostic Clusters. In ICLR 2022

[12] T. Dong, B. Zhao, L. Lyu, “Privacy for Free: How does Dataset Condensation Help Privacy?”, ICML 2022.

[13] J. Xu et al. “FedCorr: Multi-Stage Federated Learning for Label Noise Correction”, CVPR 2022

 

Some references of the group

[14] J. Pourcel, N.-S. Vu, R.M. French, “Online Task-free Continual Learning with Dynamic Sparse Distributed Memory”, ECCV 2022

[15] L. Jezequel, N.-S. Vu, J. Beaudet, A. Histace, “Anomaly Detection via Multi-Scale Contrasted Memory”. preprint 2023

[16] J.-R. Conti, N. Noiry, S. Clémençon, V. Despiegel, S. Gentric, “Mitigating Gender Bias in Face Recognition using the von Mises-Fisher Mixture Model”, ICML 2022

[17] L. Jezequel, N.-S. Vu, J. Beaudet, A. Histace, “Hyperbolic Adversarial Learnable Tasks for Anomaly Detection via Multi-Scale Contrasted Memory”. preprint 2023

[18] R. Marriott, S. Romdhani, L. Chen, “A 3D GAN for improved large-pose facial recognition”, CVPR 2021

[19] J.-R. Conti, S. Clémençon, “Assessing Performance and Fairness Metrics in Face Recognition - Bootstrap Methods”, TSRML, NeuRIPS 2022

[20] N. Larue, N.-S. Vu, V. Struc, P. Peer, V. Christophides, “SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes”, preprint 2023

[21] L. Jezequel, N.-S. Vu, J. Beaudet, A. Histace, “Efficient anomaly detection using self-supervised multi-cue tasks”. IEEE Trans. Image Processing 2023