Deepfake Video Generation and Detection
Thèmes scientifiques :
- D - Télécommunications : compression, protection, transmission
Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.
41 personnes membres du GdR ISIS, et 21 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 100 personnes.
The word deepfake comes from "deep learning" and "fake". Roughly, it refers to a video, an image or an audio recording in which the protagonist's identity or lyrics have been modified to mimic the protagonist's identity or lyrics from a source video, or even completely generated from scratch. Recent advances in deep learning, including variational auto-encoders and generative adversarial networks (GAN), have enabled the generation of high-quality fake videos and audio. This situation raises many security issues in society, especially when celebrities and politicians are targeted, and their fake videos are widely shared on social networks. Many deepfake detection solutions relying on both handcrafted and deep learning techniques have been proposed in the literature. For instance, some deepfake detection methods rely on visual artifacts, physiological inconsistencies of features space where fake and real videos are well characterized. The race between deepfake generation and detection is still in play to develop more efficient generation and detection solutions.
The aim of this workshop is to bring together industry researchers as well as academics working on deepfake generation and detection from different communities including machine learning, computer vision, image processing, multimedia security and biometric.
Topics of interest in this workshop include the following topics (but are not limited to):
- Generation of DeepFakes and face manipulation.
- Generation of synthetic faces including GAN based solutions.
- Detection of DeepFakes with hand-crafted solutions and deep learning-based solutions.
- Construction methodology of Deepfake datasets and detection benchmark.
- Resilience of Deepfake detection solution against adversarial attacks.
- Wassim Hamidouche (INSA Rennes, IETR), firstname.lastname@example.org
- Amine Kacete (b<>com Insitute, Rennes), email@example.com
- Abdenour Hadid (Université de Valencienne), firstname.lastname@example.org
The workshop is co-organized by GDR ISIS and GDR security, subscription to the workshop is mandatory at both GDR ISIS and GDR sécurité at the following link: https://gdr-secu-jn2022.sciencesconf.org/
We invite Ph. D. students and researchers to present their work in this workshop, a title and short abstract of their contribution should be sent to organizers before May 27th 2022.
8:45 - 9:00
9:00 - 9:45
[invited conference] Adversarial Machine Learning
Fabio Roli, Professor, University of Genova, Italy, Director of PRALab, University of Cagliari, Italy
9:45 - 10:30
[invited conference] Malicious Facial Image Processing and Counter- Counter-measures: A review
Jean-Luc Dugelay, Professor, EURECOM, Sophia Antipolis, France
10:30 - 11:00
11:00 - 11:30
[invited conference] Generation and Detection of Deepfakes
Antitza Dantcheva, Researcher, INRIA Sophia Antipolis, France
11:30 - 12:00
Improving Deepfake Detection by Mixing Top Solutions of the DFDC
Anis Trabelsi, Ph. D. student, EURECOM, Sophia Antipolis, France
12:00 - 12:30
Hierarchical Learning and Dummy Triplet Loss for Efficient Deepfake Detection
Nicolas Beuve, Ph. D. student, IETR, INSA Rennes, France
14:00 - 14:45
[invited conference] Détection des altérations d'images photographiques et des images générées : deux approches spécifiques
Grégoire Mercier, CTO, eXo maKina
Résumés des contributions
[invited] Fabio Roli, Adversarial Machine Learning
Machine-learning algorithms are widely used for cybersecurity applications, including spam, malware detection, biometric recognition. In these applications, the learning algorithm must face intelligent and adaptive attackers who can carefully manipulatedata to purposely subvert the learning process. As machine learning algorithms have not been originally designed under such premises, they have been shown to be vulnerable towell-crafted attacks, including test-time evasion and training-time poisoning attacks (also known as adversarial examples). This talk aims to introduce the fundamentals of adversarial machine learning and some techniques to assess the vulnerability of machine-learning algorithms to adversarial attacks. We report application examples including object recognition in images, biometric identity recognition, spam, and malware detection.
[invited] Jean-Luc Dugelay, Malicious Facial Image Processing and Counter- Counter-measures: A review
Recent advances in visual sensor technologies and computer vision enable to create more and more efficient face recognition systems, and more generally speaking, facial image processing tools. Such new performances generate some ethical and societal concerns. In parallel, such advances, in deep learning for example, contribute to the proliferation of malicious digital manipulations. The main objectives for designing such attacks is to access resources illegally, to harm some individuals or to make ineffective some technologies. In this presentation, we give an overview of existing attacks and related techniques attached to some applications in the domains of biometrics (e.g. identity spoofing), of medias (e.g. fake news), of video surveillance (e.g. de-anonymization). Some recent works investigate potential counter-measures for detecting such attacks.
[invited] Antitza Dantcheva, Generation and Detection of Deepfakes
Generative models have made remarkable progress in generating realistic images of high quality. While video generation is the natural sequel, it entails a number of challenges w.r.t. complexity and computation, associated to the simultaneous modeling of appearance, as well as motion. I will talk about our work related to design of generative models, which allow for realistic generation of face images and videos. We have placed emphasis on disentangling motion from appearance and have learned motion representations directly from RGB, without structural representations such as facial landmarks or 3D meshes. In our latest work, we have aimed at constructing motion as a linear displacement of codes in the latent space. Based on this, our model LIA (Latent Image Animator) is able to animate images via navigation in the latent space.
While highly intriguing, video generation has thrusted upon us the imminent danger of deepfakes, which can offer unprecedented levels of increasingly realistic manipulated videos. Deepfakes pose an imminent security threat to us all, and to date, deepfakes are able to mislead face recognition systems, as well as humans. Therefore, we design generation and detection methods in parallel. In the second part of my talk, I will discuss our work on deepfake detection, where in our latest work, we explore attention mechanisms in 3D CNNs.
Anis Trabelsi, Improving Deepfake Detection by Mixing Top Solutions of the DFDC
The falsification of faces in videos is a growing phenomenon over the years. One of the most popular ways to tamper a face in a video is known as "deepfake".
Today, many tools exist to allow anyone to create a deepfake to discredit an individual or usurp an identity. Fortunately, the detection of deepfakes is an increasing topic of interest for the scientific community. As a result, many efforts have been made to develop mechanisms to automatically identify deepfake videos.
In addition, several public deepfakes datasets have been built to help researchers to develop more effective detection methods. The most recent and also the most complete of these datasets is the one built by Facebook as part of the international DeepFake Detection Challenge (DFDC). Thousands of different frameworks, mainly based on deep learning, have been proposed during this challenge. The best solution that has been proposed obtains the accuracy of 82% on the DFDC dataset. However, the accuracy of this method is only 65% on unseen videos from the Internet. In this work we analyse the five best methods of the DFDC and their complementarity. In addition, we experimented with different assembly strategies (boosting, bagging and stacking) among these solutions. We show that we can achieve a large improvement (+41% on log loss and +2.26% on accuracy) when we carefully choose the models to be assembled with the most appropriate right merging method to use.
Nicolas Beuve, Hierarchical Learning and Dummy Triplet Loss for Efficient Deepfake Detection
Recent progress in deep learning-based image generation has made it easier to create convincing fake videos called deepfakes. While the benefits of such technology are undeniable, it can also be used as realistic fake news support for mass disinformation. In this contest, different detectors were proposed, many of them use a CNN as a backbone model and the binary cross-entropy as a loss function. Some more recent approaches applied a triplet loss with semi-hard triplets. In this paper, we investigate the use of triplet loss with fixed positive and negative vectors as a replacement for semi-hard triplets. This loss, called dummy triplet loss (DmyT), follows the concept of the triplet loss but requires less computation, as the triplets are fixed. It also doesn't rely on a linear classifier for prediction. We have assessed the performance of the proposed loss with four backbone networks, including two of the most popular CNNs in the Deepfake realm, Xception and EfficientNet, alongside two visual transformer networks, ViT and CaiT. Our loss function shows competitive results on FaceForensics++ dataset compared to triplet loss with semi-hard triplets while being less computationally intensive. The source code of DmyT is available at https://github.com/beuve/DmyT.
[invited] Grégoire Mercier, Détection des altérations d'images photographiques et des images générées : deux approches spécifiques
Les images photographiques étant, dans la très grande majorité des cas, codées au format JPEG, leur falsification laisse des traces dans l'historique de quantification des coefficients issus de la transformation en cosinus (DCT). Ainsi la plupart des algorithmes de détection de falsification reposent sur l'analyse des coefficients DCT, soit à travers des approches globales (leurs histogrammes, leurs dynamiques, ...), soit par des approches locales ou ponctuelles (biais de quantification multiple, texture locale).
Ces approches dépendantes de l'algorithme JPEG révèlent très explicitement la plupart des signatures de falsification d'images photographiques, mais elle deviennent inefficaces lorsque les images sont générées par, notamment, des réseaux profonds. En effet, aucune quantification et aucune transformation en Cosinus ne conditionne le signal a priori, et aucune incohérence locale de ce signal n'est à détecter. Il faut donc s'intéresser aux traces globales laissées par les architectures génératrices d'images. Nous nous focalisons ici sur les réseaux-adversaires (GAN et plus spécifiquement style-GAN) utilisés dans la génération de visages et présentons des pistes de caractérisation discriminante des signaux générés. Une expérimentation effectuée sur la base de visages flickr HD, fondée sur ces mesures discriminantes, conduit à une performance de détection des images natives ou générées par style-GAN de 95 à 99.6%.