Les commentaires sont clos.

Stage M2: OpenStreetMap and Sentinel-2 data for the automatic production of environmental indices for demographic studies, Université Paris Cité/INED

1 December 2022

Catégorie : Stagiaire

In a globalized context increasingly impacted by climate change, demographic studies would gain from taking environmental data into account and be carried out at the transnational level. However, this is not always possible in sub-Saharan Africa, as matching harmonized demographic and environmental data are seldom available. The large amount of data regularly acquired since 2015 (in 2019 only, Sentinel satellites from the European Space Agency produced 7.54 PiB of open-access data) provides an opportunity to produce relevant standardized indicators at the global scale. In particular, Sentinel-2 satellites produce multi-spectral images at a high temporal frequency (maximum 5 days) and at medium resolutions (10 to 60 meters depending on the spectral band). These images are both freely available and enable land-cover/land-use mapping. However, Sentinel-2 images alone may not be sufficient for all remote sensing tasks. For instance, fine-grained land cover mapping requires data with higher resolution, or other sources which can add to Sentinel-2 images. Several indicators and legends have been developed to help understanding geographical realities in a consistent (i.e. not location dependent) manner. Among them, Local Climate Zones (LCZs) have been initiated by WUDAPT (World Urban Database and Access Portal Tools) to systematically label urban areas. LCZs consist in 17 urban and rural classes built on surfaces properties defined without any cultural considerations. They enable describing both cities and rural areas according to their vegetation type, densities, heights and materials. Houses with light materials may not be detected by deep learning models when trained on Sentinel-2 data because of their resolution. These misdetections will alter LCZ classification results, especially for urban and semi-urban areas. Other data sources are required to increase mapping accuracy. For instance, very-high resolution orthophotos are often used for such tasks. However, as they are expensive to acquire, they are not available globally. Furthermore, they do not allow for the frequent updating of land-cover maps. Collaborative data from OpenStreetMap (OSM) could add useful information. This internship will focus on the combination of OSM data and Sentinel-2 images for generating LCZ maps of sub-Saharan countries.

The LCZ classification is meant to provide a way of mapping the world, in open-access, that can later be used by researchers for a wide range of studies. LCZ data have been used to understand energy usage, climate or geoscience modeling. An important amount of work has been dedicated in the recent years to the automatic generation of such data, from sensors such as Landsat 8 or Sentinel 2. In a research competition organized by the IEEE IADF, several methods have been tried out to map LCZ from Landsat, Sentinel 2 and OpenStreetMap data. Recent studies focus on the use of deep learning models to tackle the task of automatically mapping LCZs. However, none of them focuses on sub-Saharan Africa where data are scarce. For instance, the So2Sat dataset of is made of labels of 42 cities among which only two are located in sub-Saharan Africa. This is problematic, as spatial variations within different sub-Saharan countries can be large, and that spatial generalization of machine learning based methods is a challenge. To overcome this issue, recent studies are based on supervised and contrastive learning to extract useful global features, using the So2Sat dataset, and more specific local features, using unlabeled data. This technique produces LCZ maps that provide enough information to contextualise localised demographic information.

OSM is a collaborative platform providing a geo-referenced database of the world. Contributors and users can create and use up-to-date maps for geographic applications. It has been widely used for mapping tasks or LCZ classification with a positive impact when data quality is sufficient. Whereas OSM is well developed and regularly updated in urban areas in Western countries, it is not necessarily up-to-date in Sub-Saharan countries because of the very fast pace of change in land use and settlement related to population growth and development. For instance, these are common errors in OSM dataset for rural areas :
- Annotations are geometrically misaligned or not well geolocalized,
- buildings exist but are not annotated,
- buildings are annotated but do not exist (misannotation or building destroyed).
The work to be conducted during the proposed M2 internship will lead to the following three contributions:
- Contribution A: Development of a model to classify LCZs using high quality OSM data.
This rule-based model will allow to better understand the LCZ classification scheme. Furthermore, it will provide a baseline to the multi-modal methods to be developed during the internship.
- Contribution B: Multi-modal models for LCZ classification
Using an already trained deep-learning based method to classify LCZs, we will study different fusion mechanisms (including late fusion, rule based fusion) to integrate the information from the rule-based model. Furthermore, we will develop an end-to-end deep learning based model taking rasterized OSM and Sentinel-2 data as an input. These methods will be compared and evaluated in Ouagadougou, Burkina Faso and Antananarivo, Madagascar.
- Contribution C: Link with demographic studies and writing of the master thesis
The obtained results will be linked to demographic data in the two previously mentioned regions to better understand the underlying geo-spatial components in population studies. These results will be compared with a baseline developed during the PhD of Basile Rousse.
We are looking for a student in Master 2 or final year of MSc, or engineering school in computer science. The ideal candidate would have knowledge in image processing, computer vision, machine learning, geo-information sciences and Python programming and an interest in handling large amount of data, remote sensing and demography. An experience in statistical data analysis would be a plus.