Annonce
End-to-end lightweight neural models for video compression
11 Mai 2023
Catégorie : Doctorant
Keyword
Working place
Supervisors
- Xiaoran Jiang, Associate professor (enseignant-chercheur), INSA Rennes, IETR, Vaader Team
- Daniel Ménard, Professor, INSA Rennes, IETR, Vaader Team
Context
Nowadays, video content represents more than 80% of Internet traffic [1], and this percentage is still increasing. By consequence, it is essential to develop efficient video compression systems and to generate videos with better quality given a limited bandwidth budget. Moreover, most video-related computer vision tasks such as object detection or object tracking are sensitive to the quality of the compressed videos.
After several decades of development, the hybrid prediction/transform coding methods have achieved great success. However, the refinement strategies based on image and video local correlations are more and more difficult for further coding efficiency improvement. The complexity of the encoder increases rapidly as well. For example, the complexity of the new standard VVC (Versatile Video Coding) [4] is 10 to 30 times higher than H.265 (HEVC, High Efficiency Video Coding) [2,3], the previous generation.
Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both image/video understanding and processing fields, also provides a novel and promising solution for image and video compression. These learning-based methods have significant differences from traditional codecs. Especially, the neural codecs have the advantage of being able to be globally optimized in an end-to-end fashion. Very recently, some end-to-end neural video codecs [5,6,7,8,9,10] have been shown competitive against traditional video coding standard, such as HEVC, in terms of bitrate-distortion trade-off or
in perceptual quality, revealing interesting perspectives on next generation video frameworks/standards.
However, considering the complexity of neural networks in the current state, they have not been largely deployed yet in any industrial use, especially in the era where mobile devices are more and more widely used. Most of them generally involve heavier computations both on the encoder and the decoder side, which in some cases, implies powerful hardware, i.e., a GPU on the end user side. The memory required to store all the network weights (generally more than tens of millions of parameters) is also a big issue. Consequently, exploring and searching for compact neural network structures for efficient video and image compression is considered as one of the hottest future trends in compression community.
Objectives
The objective of this thesis is to design and develop novel end-to-end lightweight deep models for efficient video compression. These neural models will have the advantage of having low power consumption, low latency, low cost for hardware deployment, better parallel inference and better confidentiality, while achieving high visual quality.
Furthermore, as a large part of video communication today takes place on mobile devices, we aim to demonstrate that neural video codecs (or at least, neural video decoding) would be able to run in real-time on power-constrained devices such as mobile phones. A mobile-friendly neural video codec would open interesting perspectives for video coding for machine.
Methodology
Since the domain of efficient neural video codec is considerably novel and unexploited, the challenge remains open. It would be possible to exploit multiple research directions. Recent advances in deep learning techniques can be used to carry out this work. For example, deep model compression techniques (quantization, low rank factorization), learning techniques (distillation, sparse learning, quantization sensitive scaling, transfer learning) and efficient architectures, to name just a few. Overfitting a neural network, such as implicit neural model, to a video or segments of a video is another means to reduce model complexity while improving rate-distortion performance. We can also leverage on efficient learning-based frame interpolation or extrapolation methods to reduce bit-rate.
This thesis will be interesting for you if
- You are passionate about image and video.
- You want to work in a promising domain for which the demand from industrial and academic entities will be significant in the following years.
- You want to become an expert in deep learning, image/video processing and computer vision.
- You like academic research environment with industrial collaborations
Preferred skill
- Solid background in image/video compression, image processing, computer vision, deep learning, machine learning.
- Familiar with Python/C++ programming languages. Experiences on deep learning frameworks, such as Pytorch et Tensorflow.
- Fluent technical English practice. Good level of English writing is a must. A good French level is a plus.
- Previous experience on scientific publications (conference, journal) is a plus.
Education level
Master degree in computer science, telecommunications, signal processing, image processing, or any related domains.
Applications
Send resume and application letter to xiaoran.jiang@insa-rennes.fr and Daniel.menard@insa-rennes.fr
References
[1] Where Does the Majority of Internet Traffic Come From? https://www.ncta.com/whats- new/report-where-does-the-majority- of-internet-traffic-come.
[2] Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. Overview of the H.264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology, 13(7):560–576, 2003.
[3] Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology, 22(12):1649– 1668, 2012.
[4] "H.266: Versatile video coding". www.itu.int. Archived from the original on 21 June 2021.
[5] Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao, “DVC: An End-to-end Deep Video Compression Framework”, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
[6] Ren Yang, Fabian Mentzer, Luc Van Gool and Radu Timofte, "Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model", IEEE Journal of Selected Topics in Signal Processing (J-STSP), 2021.
[7] Ren Yang, Radu Timofte and Luc Van Gool, "Perceptual Learned Video Compression with Recurrent Conditional GAN", in Processings of the International Joint Conference on Artificial Intelligence (IJCAI), 2022.
[8] Eirikur Agustson, David Minnen, Nick Johnson, Johannes Balle, Sung Jin Hwang and George Toderici, “Scale-Space Flow for End-tp-End Optimized Video Compression”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.
[9] Zhihao Hu, Guo Lu, Jinyang Guo, Shan Liu, Wei Jiang, and Dong Xu. “Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[10] Oren Rippel, Alexander G. Anderson, Kedar Tatwawadi, Sanjay Nair, Craig Lytle and Lubomir Bourdev. “ELF-VC: Efficient Learned Flexible-Rate Video Coding”, in Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), 2021.