Deep architecture to learn discriminative parts
Keywords: computer vision, deep learning, image classification, part-based models
Environment: QARMA (machine learning) Team at LIS (Laboratoire d’Informatique et Systèmes), in Marseille, France.
Supervisors: Ronan Sicre, Thierry Artieres
The proposed internship is in the field of computer vision, pattern recognition, and machine learning and it more specifically focuses on image classification. Image recognition has been intensively studied in both academia and industry and a large variety of methods exist that aim at building compact, precise, and transferable representation from which classification may be performed accurately [krizhevsky].
While Deep Learning has popularized the idea that deep neural architectures may automatically build relevant and informative representations from raw data, alternative works show the interest of representation based on image parts that are learned from training data. In such works, a vocabulary of latent parts is learned per category before building an image representation based on the occurrence of certain parts (without any extra labels). It can be viewed as bag of features representation where features are rather high level here and more related to the presence or absence of specific parts.
Several work focused on learning part-like representations for image classification. Doersch et al [doersch] use density based mean-shift algorithms to discover discriminative regions. Starting from a weakly labeled image collection, coherent patch clusters that are maximally discriminative with respect to the labels are produced, requiring a single pass through the data. Juneja et al [juneja] also aimed at discovering distinctive parts for an object or scene class by first identifying the likely discriminative regions by low-level segmentation cues, and then learning part classifiers on top of these regions. The two steps are alternated iteratively until a convergence criterion based on Entropy-Rank is satisfied. Similarly [mettes] propose to learn parts that are shared across classes. Similarly, our model is based on matching image regions to part representations with an iterative soft-assignment algorithm [sicre1], [sicre2]. These works showed the importance of part-based representations to better describe images and discriminate similar classes. Following these approaches, the goal of this internship is to translate such part learning process into a new deep CNN architecture that would address the current limitations of part-based methods and allow end-to-end training at larger scale.
Several directions will be studied towards such a translation. First, we should adapt the standard convolutional architecture to allow for operations, such as: learning features to represent image regions, learning discriminative parts, representing an image based on the parts (encoding), and finally, learning a task-dependent function (e.g. a classifier). Second, in several existing models like ours [sicre1], parts are by construction different from one another. This implies that parts should satisfy orthogonality constraints, possibly combined with data-dependent initialization. Third, spatial competition can be obtained by NMS (non-maximal suppression). Such a translation of part detection operations into network layers, may allow deeper networks modeling a hierarchical structure of parts.
Carl Doersch, Abhinav Gupta, and Alexei A Efros. Mid-level visual element discovery as discriminative mode seeking. NIPS, 2013.
Mayank Juneja, Andrea Vedaldi, C V Jawahar, and Andrew Zisserman. Blocks that shout: Distinctive parts for scene classification. ICCV, 2013.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012.
Pascal Mettes, Jan C. van Gemert, and Cees GM Snoek. No spare parts: Sharing part detectors for image categorization. CVIU, 2016.
Ronan Sicre and Frédéric Jurie. Discriminative part model for visual recognition. CVIU, 2015.
Ronan Sicre, Yannis Avrithis, Ewa Kijak, and Frédéric Jurie. Unsupervised part learning for visual recognition. CVPR, 2017.
(c) GdR 720 ISIS - CNRS - 2011-2018.