PhD thesis on Continual Learning for Computer Vision in Open World - CEA List
10 Juillet 2023
Catégorie : Doctorant
Continual Learning for Scene Analysis in Open World
Objective of the thesis
The objective of the thesis is to propose methods of continual learning with the open-world hypothesis in order to give to scene understanding models (like semantic segmentation and object detection) the ability of evolving when seeing new semantic classes. This evolution must be made with the constraint of unavailability of past data (hypothesis of incremental learning).
The ability of discovering unseen classes and recognizing them as a novelty will be necessary in order to make the model effective and trustable. The ability of distinguishing different unseen classes and different instances of unseen classes will also be advantageous for a more efficient learning. In addition, we will study how the few-shot learning paradigm can be applied efficiently to learn the new classes with fewer annotations [13, 14], therefore, minimizing human effort.
As usual, the PhD candidate will start with analyzing the state of the art in open-world incremental learning and pointing the limitations of the existing methods. Novel methods with severe memory, annotation and computational constraints will be proposed, developed, evaluated with appropriate benchmarks then published.
Presentation of the host laboratory
Based in Paris-Saclay campus, CEA-LIST is one of four technological research institutes of CEA TECH, the technological research direction of CEA. Dedicated to intelligent digital systems, it contributes to the competitiveness of companies via research and knowledge transfers. The expertise and competences of the 800 research engineers and technicians at CEA-LIST help more than 200 companies in France and abroad every year on subjects categorized over 4 programs and 9 technological platforms. 21 start-ups have been created since 2003.
The Computer Vision and Machine Learning for scene understanding laboratory addresses computer vision subjects with a stronger emphasis on four axes:
§Recognition (detection or segmentation of objects and persons)
§Behavior analysis (action and gesture recognition, anomalous behavior of individuals or crowds)
§Smart annotation (large scale annotation of 2D and 3D data using semi-supervised methods)
§Perception and decision-making (Markovian decision processes, navigation)
The PhD candidate will join a team composed of 30 researchers (researchers, PhD students, interns) and will be able to interact with peers working on related subjects and methods.
Visual scene understanding is necessary in many applications (autonomous driving, smart city, smart home, manufacturing & robotics, defense & security, agriculture, sports & health…). To this aim, deep learning models are particularly efficient to deal with computer vision tasks like semantic segmentation and object detection. However, a trained model is no more effective nor useful when seeing new semantic classes (“things” or “stuff”) which were not present during the training. Training from scratch the model with additional labeled images representing this novelty together with the old images would not be efficient or possible (too large size or unavailability of old data).
Incrementally adapting an existing object detection model to detect new unseen classes with severe memory and computational constraints is a critical capacity in real-world applications such as robotics, self-driving vehicles or video surveillance. However, while human beings can easily recognize new objects continuously without forgetting the old knowledge, deep learning models can suffer from ‘catastrophic forgetting’. In fact, adding new classes without using the old training dataset can cause a big degradation of performance on the original set of classes.
To overcome this issue, many approaches of continual/incremental learning have been proposed . Some methods use a memory buffer to save a set of the old dataset and re-use it to retrain the model with the new classes  or extend the model architecture by adding other detection heads. Others focus essentially on regularizing the training to minimize the discrepancy between responses for the old and the updated model . These methods have to face a trade-off between model plasticity and rigidity. Their results are still limited compared to the models trained jointly with all the dataset.
While most of studies are conducted on image classification , fewer methods focus on incremental learning for tasks like object detection or semantic segmentation . In general, models learn a little from new classes and try not to degrade performance on old classes whereas it would be desirable to learn better new classes and improve old classes that can also be present in new data. Moreover, adapting to these tasks methods of incremental learning developed for image classification is not straightforward. Indeed, the additional challenge of ‘background shift’ makes the incremental learning more difficult.
To this aim, recent methods make the open-set hypothesis to detect a novelty, out of the known classes. Some methods go further by differentiating unknown classes of objects, with the open-world hypothesis [6, 7, 8, 9, 10]. They identify instances of unknown objects as unknown and subsequently learn to recognize them when training data progressively arrive without retraining from scratch (cf. Fig.1). Similarly, methods for open-world semantic segmentation have been proposed [11, 12].