Les commentaires sont clos.

Learning Based Coding for Digital Image and Video Information

Date : 15-06-2021
Lieu : Distanciel

Thèmes scientifiques :
  • D - Télécommunications : compression, protection, transmission

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

S'inscrire à la réunion.


53 personnes membres du GdR ISIS, et 71 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 300 personnes.



In recent years, with the vulgarization of image and video acquisition devices, we are seeing unprecedented growth in visual data, especially with the increase of the dimensionality, e.g., UHD, 8K, 360-degree, point cloud and light field. Consequently, efficient compression is essential to store or transmit this huge amount of data. Despite the considerable performance achieved by the state-of-the-art image/video coding standards, the existing signal-processing oriented compression methods, such as hybrid coding approach, showed their limitations and it is becoming increasingly difficult to meet the growing demands of data. Consequently, the adoption of new approaches such as deep neural networks (DNNs) based methods represents a great potential to address this challenge and can provide very promising results.

Topics of interest in this workshop will include:

  • End-to-end image/video coding framework based on DNN.
  • Graph Learning Networks for image and video coding.
  • Learn to quantize.
  • Novel encoder/decoder DNN architectures for image/video compression.
  • Semantic-fidelity oriented image/video compression.
  • Super-resolution for image/video compression.
  • Adversarial learning.
  • Perpetual loss function.
  • Video compression via frame interpolation (image synthesis).
  • Quality assessment of DNN-based image/video compression.
Guest speaker

Prof. Joao Ascenso (Professor at Instituto Superior Técnico, Prtugal)

Dr. Johannes Ballé, Senior Researcher Google USA


Wassim Hamidouche - IETR, INSA Rennes (whamidou@insa-rennes.fr)

Thomas Maugey - INRIA (thomas.maugey@inria.fr)

Aline Roumy - INRIA (aline.roumy@inria.fr)


13H30 - 13H45 : Meeting welcome by the organizers

13H45 - 14H30 : Learning-based Image Coding: a New Piece of the Multimedia Puzzle, Invited talk by João Ascenso, Instituto Superior Técnico, Portugal

14H30 - 14H50 : Designing a Learning-based Video Coder, Théo Ladune Orange, IETR, Rennes, France

14H50 - 15H10 : Ultra-low bitrate video conferencing using deep image animation, Goluck Konuko, LTCI, Telecom Paris, Institut polytechnique de Paris

15H10 - 15H30 : CNN-based Quality Enhancement for VVC, Fatemeh Nasiri, Aviwest, b<>com Rennes

15H30 - 15H45 : Coffee break

15H45 - 16H10 : Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model, Ren Yang and Radu Timofte, Department of Information Technology and Electrical Engineering, ETH Zurich

16H10 - 16H30 : On-the-sphere learning for end-to-end compression of omnidirectional images, Navid Mahmoudian Bidgoli, Inria, Rennes, France

16H30 - 17H10 : Learned perceptual representations, Invited talk by Johannes Ballé, Senior researcher at Google USA

17H10 - 17H30 : CompressAI: end-to-end compression in PyTorch, Jean Begaint, InterDigital, USA

Résumés des contributions

Learning-based Image Coding: a New Piece of the Multimedia Puzzle

João Ascenso, Instituto Superior Técnico, Prtugal

Learning-based image coding solutions can already achieve substantially better compression efficiency than existing conventional solutions, namely by exploiting advanced machine learning tools, such as deep neural networks. Besides their high compression efficiency, learning-based image coding solutions can offer an intrinsically feature-rich representation for image processing and computer vision tasks, without requiring full decoding (thus, skipping image reconstruction). This contrasts with classical image codecs, which when used in many image processing and computer vision pipelines, require full decoding of the compressed bitstream to obtain a pixel-based representation.

In this talk, the JPEG AI project will be presented, namely the drive towards the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization, with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks, with the goal of supporting a royalty-free baseline. In this context, special emphasis will be given to the JPEG AI triple-purpose evaluation, which includes: i) image reconstruction, ii) extraction of semantic information, and iii) enhancement or modification of the image, for example for increased resolution, contrast, etc.

Designing a Learning-based Video Coder

Théo Ladune, Orange Labs, IETR Lab, INSA Rennes

In recent years, learning-based approaches for image coding has reached performance competitive with the best traditional methods (HEVC & VVC). Yet, learned video coding remains a more challenging task due to the supplementary temporal dimension. This talk illustrates two contributions, conditional coding and coding mode selection. Together, they make the design and the training of the system easier, while significantly increasing the performance. The resulting coding scheme is shown to achieve performance on par with the best HEVC implementation, under the CLIC21 video coding test condition.

Ultra-low bitrate video conferencing using deep image animation

Goluck Konuko, Giuseppe Valenzise, Stéphane Lathuilière

LTCI, Telecom Paris, Institut polytechnique de Paris

Université Paris-Saclay, CNRS, CentraleSupelec, Laboratoire des signaux et systèmes

In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on the encoder output. Objective and subjective quality evaluation experiments demonstrate that the proposed approach provides an average bitrate reduction for the same visual quality of more than 80% compared to HEVC.

CNN-based Quality Enhancement for VVC

Fatemeh Nasiri, Aviwest, b<>com

Artifact removal and filtering methods are inevitable parts of video coding. On one hand, new codecs and compression standards come with advanced in-loop filters and on the other hand, displays are equipped with high capacity processing units for post-treatment of decoded videos. This work proposes a Convolutional Neural Network (CNN)-based post-processing algorithm for intra and inter frames of VVC coded streams. Depending on the frame type, this method benefits from normative prediction signal by feeding it as an additional input along with reconstructed signal and a QP-map to the CNN. Moreover, an optional Model Selection (MS) strategy is adopted to pick the best-trained model among available ones at the encoder side, and signal it to the decoder side. This MS strategy is applicable at both frame level and block level. The experiments under the Random Access configuration of the VVC Test Model (VTM-10.0) show that the proposed prediction-aware algorithm can bring an additional BD-BR gain of -1.3% compared to the method without the prediction information. Furthermore, the proposed MS scheme brings -0.5% more BD-BR gain on top of the prediction-aware method.

Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Ren Yang and Radu Timofte, Department of Information Technology and Electrical Engineering, ETH Zurich

The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this work proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git.

On-the-sphere learning for end-to-end compression of omnidirectional images

Navid Mahmoudian Bidgoli, INRIA Rennes

Since omnidirectional images are functions defined on the sphere, it is natural to learn the compression algorithm directly on the sphere. Despite the success of convolutional neural networks for 2D images, they are not very suitable for omnidirectional images, even if we project these signals on 2D images. This is mainly because projecting omnidirectional images on a 2D plane leads to a signal that has different spatial and statistical distribution compared to 2D images. Currently available on-the-sphere solutions have achieved good performance for tasks that are invariant to the orientation of input patterns, such as classification. However, they are not effective for tasks that local orientation matters such as compression. In this paper, we propose on-the-sphere learning solutions that take into account the local orientation. More precisely, we propose tools to be able to extend any powerful architecture that exists on the 2D grid on the sphere. For that, we define every module that exists for 2D image compression on the sphere. This allows achieving a better performance in terms of rate-distortion criteria since neither the visual data nor its domain of definition is distorted. We conduct extensive experiments to evaluate the compression of spherical images using deep learning architectures.

Learned perceptual representations

Dr. Johannes Ballé, Senior researcher at Google USA

With the introduction of end-to-end training for image and video compression, it is now possible to much more directly optimize the rate and distortion of such methods. While rate is comparably easy to quantify, distortion is not ? it comes down, in the end, to subjective perceptual assessments of humans. In this talk, I give a very brief overview of how perceptual quality has been modeled in the past, and discuss some ideas (and recent results) on the role representation learning could play in designing better perceptual measures of distortion.

CompressAI: end-to-end compression in PyTorch

Jean Begaint, InterDigital USA

CompressAI is a PyTorch library that provides custom operations, layers, and utilities to develop and evaluate end-to-end image and video compression pipelines. CompressAI also includes state-of-the-art pre-trained image compression models and evaluation tools to compare learned methods with traditional codecs. By implementing common building blocks for neural compression, CompressAI can be used to research novel codecs for end-to-end video compression and video coding for machines