Vous êtes ici : Accueil » Kiosque » Annonce


Mot de passe : 

Mot de passe oublié ?
Détails d'identification oubliés ?


21 novembre 2018

Learning with incomplete data and model: Using Data Assimilation to train Machine Learning models

Catégorie : Stagiaire

Machine learning, and more specifically deep learning, has been undergoing a sustained revolution over the past decade, creating new methods and applications for predicting and understanding a multitude of phenomena with complex spatio-temporal dependencies. The advent of ResNet architectures (He et al. 2015) has shown that using Machine Learning in prediction problems requiring the use of differential equations has become mainstay, and can, in some cases compete, or even outperform analytical numerical models (Zhang et al. 2017).

The “​learning” ​part, however, of machine and deep learning relies on a relevant training data-set containing samples of spatio-temporal dependent structures. In many fields, there is an absence of direct observations of the causal variables, and as such, learning techniques cannot be readily deployed.

The field of Data Assimilation is closely related to Machine Learning. Its objective is to combine the information provided by observations of a system and those provided by a numerical model that simulates the dynamics of this system, in order to optimally estimate the state of the system.

Figure 1: Sea Surface temperature images for six consecutive days derived from satellite measurement, presenting highly dynamic structures. Source : Bereziat et al. 2018


The objective of this internship is to determine if we can make use of the long-standing expertise of Data Assimilation in calibrating numerical models under missing observation constraints, to train a neural network under similar constraints.

As such, we will perform a twin base experiment based on a model representing a geophysical system. The evolution in time of the state variable of a system is governed by the following equation:



is the state variable at time ,

is a known physical model model based on equation,

is a part of the model which is unknown a−priori and that we want to simulate using a machine−learning approach.


For example, we could apply this approach to a Lagrangian advection of the 2D velocity of sea surface temperature (see Fig. 1) in which is the linear part of the equation and

is the non-linear part (assumed to be unknown for the experiment).
We also assume that we have some observations of the system. The proposed approach is to iterate a 4D-Var optimization scheme on the observations followed by a CNN learning step.

The intern will get the opportunity to familiarize themselves with these two science fields, code in Python, as well as the foundations of scientific article preparation. The internship’s subject is opening new ground in the field, and could potentially to further research. A PhD (subject to funding) may be considered following the internship.

The internship will be remunerated, and hosted in LIP6, under the supervision of Dominique Béréziat, with co-supervision of Julien Brajard (LOCEAN) and Anastase Charantonis (ENSIIE).

Keywords:​​ Deep Learning, Data Assimilation, Shallow Water, Learning with incomplete data.


Coding in Python (or coding proficiency to transition to python within ~ a week) Ground knowledge of ​optimisation problems​ (e.g. regression) and ​statistics​ (e.g. Gaussian law).


[He et al, 2015] Deep Residual Learning for Image Recognition. He, K. ​et al​. In Arxiv, 2015.

[Zhang et al. 2017] Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. Zhang, J ​et al.​ In AAAI 2017.

[Bereziat et al. 2018] ​Motion and acceleration from image assimilation with evolution models. Béréziat, D. and Herlin I. In DSP, 2018.


Dans cette rubrique

(c) GdR 720 ISIS - CNRS - 2011-2018.