Annonce

Les commentaires sont clos.

Hybrid learning for the detection of objects in remote sensing images

25 Mars 2022


Catégorie : Doctorant


The thesis will address two issues:

  • How to build a specific object detection when the object category is defined by textual/ knowledge based descriptions and visual samples?
  • How to efficiently exploit a geo-spatial context description (by text or by image samples) and introduce it as a prior into an object detection algorithm?

The thesis will take place at ONERA.

For more Information, contact : stephane.herbin@onera.fr

https://w3.onera.fr/formationparlarecherche/sites/w3.onera.fr.formationparlarecherche/files/tis-dtis-2022-15.pdf

 

Satellite images are an interesting and easy way to acquire information over large areas. A typical use case is that of estimating activity by observing the presence and number of characteristic objects in a given scene: cruise ships in a port, for example, can measure tourist activity, the number of sheds on the outskirts of a city gives an indication of the level of economic development, functional buildings (intact or damaged) can reveal the level of political organization after a natural disaster. Automated tools for detecting these objects are necessary, for example, to regularly monitor the development of such activity.

Supervised learning of deep networks is the modern technique used to design object detectors on images. It relies on a large set of annotated data that samples the input distribution of images. However, because of the specificity of the objects sought in the images, there is often no training database available to apply a purely supervised learning approach, and new means to exploit other sources of knowledge become necessary.

The objective of Hybrid Artificial Intelligence or learning is to propose formal ways to introduce good priors on a data-driven approach expressed in the form of a knowledge representation or from an analytical model. In this thesis, it is proposed to use natural language as a flexible way to encode such knowledge in order to specialize object detectors.

Approaches associating natural language and computer vision [1] have been developed recently thanks to the possibilities offered by deep neural network architectures to make the combination of multi-modal representation spaces easier. Problems such as image captioning, visual question answering (""VQA""), dialogue or zero-shot learning have now become standard research subjects, and have given rise to some applications in the field of remote sensing


One proposed research direction is to complement language based object descriptions with additional visual examples. This objective can be seen as associating so-called ""zero-shot"" (i.e. certain object classes are defined only in a textual way) [2] and ""few-shot"" (i.e. the object class is defined only from a few examples) [3][4] or incremental [5] approaches for object detection. This task is now sometimes referred to as ""any-shot"" in the literature [6]. Conditional image generation from free-form descriptions is also an interesting direction to alleviate specific object data shortages [7].

One specificity of the field of satellite data analysis is the apparent small size of certain objects of interest or the ambiguity of their shape. This often makes objects only identifiable by their context, by their configuration or in relation to other entities: vehicle on the road, boats docked in a port, church or town hall in front of a square, school near a courtyard, camp containing a series of standardized buildings, etc. The question then arises of introducing into a detector, and in a sufficiently flexible manner, knowledge or descriptions making it possible to describe such a context [8].

The thesis will therefore address two issues:

  • How to build a specific object detection when the object category is defined by textual/ knowledge based descriptions and visual samples?
  • How to efficiently exploit a geo-spatial context description (by text or by image samples) and introduce it as a prior into an object detection algorithm?

The thesis will take place at ONERA. The work will be associated with several research projects about remote sensing data interpretation and will benefit from interactions with other researchers and PhD students of the team.

references

  1. https://arxiv.org/abs/1907.09358.
  2. https://github.com/KennithLi/Awesome-Zero-Shot-Object-Detection.
  3. https://arxiv.org/abs/1904.05046
  4. https://arxiv.org/abs/2006.07826
  5. http://proceedings.mlr.press/v119/wang20j.html
  6. https://arxiv.org/abs/2003.07003
  7. https://arxiv.org/abs/1911.07933
  8. https://arxiv.org/abs/2010.01305

A complete description of the subject with more references :

https://w3.onera.fr/formationparlarecherche/sites/w3.onera.fr.formationparlarecherche/files/tis-dtis-2022-15.pdf