Les commentaires sont clos.

Spatio-temporal data augmentation models for motion pattern learning using deep learning: applications to facial analysis in the wild

10 December 2021

Catégorie : Stagiaire

Conceive and evaluate augmentation architectures that address a subset of facial analysis challenges (wide range of motion intensities, head movements) while maintaining temporal stability and eliminating facial artifacts.


Facial expression analysis is a well-studied field when dealing with segmented and constrained data captured in lab conditions. However, many challenges must still be addressed for building in-the-wild solutions that account for various motion intensities (ranging from micro to macro-expressions), strong head movements during expressions, the spotting of the subsequence containing the expression, partially occluded faces, various and noisy backgrounds and illumination conditions, etc. In recent years, learned features based on deep learning architectures were proposed in order to deal with these challenges. Deep learning is characterized by neural architectures that depend on a huge number of parameters. The convergence of these neural networks and the estimation of optimal parameters require large amounts of training data, especially when dealing with spatio-temporal data, which is particularly adequate for facial expression recognition [1]. The quantity, but also the quality, of the data and its capacity to reflect the addressed challenges are key elements for training properly the networks. Augmenting the data artificially in an intelligent and controlled way is an interesting solution [2]. The augmentation techniques identified in the literature are mainly focused on image augmentation and consist of scaling, rotation, and flipping operations, or they make use of more complex adversarial training [3]. These techniques can be applied at the frame level, but there is a need for sequence level augmentation in order to better control the augmentation process and ensure the absence of temporal artifacts that might bias the learning process. The generation of dynamic frontal facial expressions has already been addressed in the literature [4]. The goal of this internship is to go beyond the current state of the art and conceive new space-time augmentation methods for unconstrained facial analysis (involving head movements, occlusions, etc.). We think that deep learning tools such as deep fakes, generative adversarial networks, and face registration may contribute to generating new artificial data for the temporal domain. However, attention should be paid in assessing the quality standards related to facial expression requirements: stability over time, absence of facial artifacts, etc. More specifically, the candidate is expected to conceive and evaluate augmentation architectures that address a subset of challenges (wide range of motion intensities, head movements) while maintaining temporal stability and eliminating facial artifacts.

Our research group

The FOX research group is part of the CRIStAL laboratory (University of Lille, CNRS), located in Lille, France. We focus on video analysis for human behavior understanding. Specifically, we develop spatio-temporal models of motion for tasks such as abnormal event detection [5], emotion recognition [6], and face alignment [7].

Applicant profile

This internship is for a student with a Bac+5 level (Master 2, Engineering school, etc.) specialized in computer science, data science, statistics or a related discipline. Knowledge or experience in one or more of the following areas will be appreciated:
- machine learning,
- image processing / computer vision,
- deep neural networks.

Required skills

- good level in programming,
- good level in English,
- good writing skills,
- scientific curiosity.



[1]J. N. Bassili – Emotion recognition: The role of facial movement and the relative impor-tance of upper and lower areas of the face – Journal of personality and social psychology, vol. 37, no. 11, 1979

[2] Z. Feng, J. Kittler et X. Wu – Mining Hard Augmented Samples for Robust Facial Landmark Localization With CNNs – IEEE Signal Processing Letters, vol. 26, no. 3, 2019, pp. 450-454

[3] C. Shorten, T.M. Khoshgoftaar – A survey on Image Data Augmentation for Deep Learning – Journal of Big Data, vol. 6, no. 60, 2019

[4] N. Otberdout, M. Daoudi, A. Kacem, L. Ballihi and S. Berretti. – Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets – IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020 (in press)

[5] R Belmonte, P Tirilly, IM Bilasco, C Djeraba, N Ihaddadene – Video-based Face Alignment withLocal Motion Modeling – IEEE Winter Conference on Applications of Computer Vision, 2019

[6] D. Poux, B. Allaert, N. Ihaddadene, I. M. Bilasco, C. Djeraba, M. Bennamoun – Dynamic Facial Expression Recognition under Partial Occlusion with Optical Flow, IEEE Transaction on Image Processing, 2021 (in press)

[7] R. Belmonte, B. Allaert, P. Tirilly, I.M. Bilasco, C. Djeraba, N. Sebe – Impact of Facial Landmark Localization on Facial Expression Recognition – IEEE Transactions on Affective Computing, 2021 (in press)