Stage M2: OpenStreetMap and Sentinel-2 data for the automatic production of environmental indices for demographic studies, Université Paris Cité/INED
1 December 2022
Catégorie : Stagiaire
In a globalized context increasingly impacted by climate change, demographic studies would gain from taking environmental data into account and be carried out at the transnational level. However, this is not always possible in sub-Saharan Africa, as matching harmonized demographic and environmental data are seldom available. The large amount of data regularly acquired since 2015 (in 2019 only, Sentinel satellites from the European Space Agency produced 7.54 PiB of open-access data) provides an opportunity to produce relevant standardized indicators at the global scale. In particular, Sentinel-2 satellites produce multi-spectral images at a high temporal frequency (maximum 5 days) and at medium resolutions (10 to 60 meters depending on the spectral band). These images are both freely available and enable land-cover/land-use mapping. However, Sentinel-2 images alone may not be sufficient for all remote sensing tasks. For instance, fine-grained land cover mapping requires data with higher resolution, or other sources which can add to Sentinel-2 images. Several indicators and legends have been developed to help understanding geographical realities in a consistent (i.e. not location dependent) manner. Among them, Local Climate Zones (LCZs) have been initiated by WUDAPT (World Urban Database and Access Portal Tools) to systematically label urban areas. LCZs consist in 17 urban and rural classes built on surfaces properties defined without any cultural considerations. They enable describing both cities and rural areas according to their vegetation type, densities, heights and materials. Houses with light materials may not be detected by deep learning models when trained on Sentinel-2 data because of their resolution. These misdetections will alter LCZ classification results, especially for urban and semi-urban areas. Other data sources are required to increase mapping accuracy. For instance, very-high resolution orthophotos are often used for such tasks. However, as they are expensive to acquire, they are not available globally. Furthermore, they do not allow for the frequent updating of land-cover maps. Collaborative data from OpenStreetMap (OSM) could add useful information. This internship will focus on the combination of OSM data and Sentinel-2 images for generating LCZ maps of sub-Saharan countries.
The LCZ classification is meant to provide a way of mapping the world, in open-access, that can later be used by researchers for a wide range of studies. LCZ data have been used to understand energy usage, climate or geoscience modeling. An important amount of work has been dedicated in the recent years to the automatic generation of such data, from sensors such as Landsat 8 or Sentinel 2. In a research competition organized by the IEEE IADF, several methods have been tried out to map LCZ from Landsat, Sentinel 2 and OpenStreetMap data. Recent studies focus on the use of deep learning models to tackle the task of automatically mapping LCZs. However, none of them focuses on sub-Saharan Africa where data are scarce. For instance, the So2Sat dataset of is made of labels of 42 cities among which only two are located in sub-Saharan Africa. This is problematic, as spatial variations within different sub-Saharan countries can be large, and that spatial generalization of machine learning based methods is a challenge. To overcome this issue, recent studies are based on supervised and contrastive learning to extract useful global features, using the So2Sat dataset, and more specific local features, using unlabeled data. This technique produces LCZ maps that provide enough information to contextualise localised demographic information.