M2/Ecole d’Ingénieur Internship (IRISA Vannes): Active learning and object detection in multimodal aerial images
14 Novembre 2023
Catégorie : Stagiaire
The context of this internship is motivated by issues raised in studies with data collected by airborne imagery. The automation of the processing of this data, by object detection methods and supervised learning, requires annotated databases. The annotation step is therefore a task of great interest, both in machine learning (ML) and computer vision (CV). Carrying it out manually is tedious and costly in terms of time and human resources. Furthermore, in the case of multimodal images (i.e. acquired by several sensors), annotation must be performed for each modality.
Active Learning (AL) is related to semi-supervised Machine Learning in which a learning algorithm can interact at each iteration with the user to get some information about labels of new data during the training step. It is motivated by situations in which it is easy to collect unlabeled data but costly (time, money, tedious task) to (manually) obtain their labels. It stems from the idea that we should only acquire labels that actually improve the ability of the model to make accurate predictions. Instances that are more useful than others according to some performance measures have to be identified to create an optimal training dataset: well chosen, fewer representative instances are needed to achieve similar performance as if we label and use all available data. This selection process has been investigated as selective sampling. The importance of an instance is related to a high level of both the information and uncertainty relative to the trained model, considering therefore a trade-off between informativeness (ability to reduce the uncertainty of a statistical model) and representativeness (ability to represent the whole input data space) of the selection process.
In remote sensing, AL has therefore become an important approach to collect informative data for object detection and supervised classification tasks, and to assist the annotation process.
The effectiveness of object detection models is intricately tied to the quantity of annotated data at their disposal. To overcome this challenge, AL attempts to formulate a strategy for cherry-picking pertinent data that an annotator should annotate, as elucidated by Choi et al. This typically involves employing a scoring mechanism that is related to the model's uncertainties about the data. Computationally, ascertaining these uncertainties usually necessitates a multi-model approach. However, it's noteworthy that these ensemble techniques are resource-intensive. Hence, the overarching objective of AL lies in the formulation of a classification function that faithfully mirrors the data's contribution to the learning process.
In the paper by Brust et al., a novel approach to object detection using deep learning is introduced. Their approach incorporates AL strategies to explore unlabeled data. The authors proposed and compared various learning metrics that are suitable for most object detectors, taking into account class imbalance. To start this project, the first step involves evaluating the performance of a multimodal object detector with respect to these metrics by applying them to a single modality (RGB for example). This evaluation will be carried out under different settings, including various sizes of the initial dataset and different adjustments of algorithm parameters. Then, the aim is to extend the AL strategy to the case of multimodal images. Indeed, for each object all modalities do not contribute equally to the classification/localization tasks, one can be more informative than the other. Finally, metrics proposed by Brust et al., focus on classification uncertainty, however, the aspect of localization is overlooked. To get the uncertainty of localization, we can use a strategy like the one of the Gaussian YOLO approach that provides both classification and localization uncertainties which we can then use with Brust et al. metrics.