Post-doctoral position on Machine learning for embedded classification of plankton images at the "Laboratoire d'Océanographie de Villefranche", in the frame of the BRIDGES European Project. Duration 12 to 18 months.
The BRIDGES project 1 (European Horizon 2020 program) aims at developing a deep-underwater glider (down to 6000m) and proposing sensors adapted to different uses of this vector: measurement of biogeochemical variables, underwater prospection for mining, evaluation of underwater diversity, etc.
The Laboratoire d’Océanographie de Villefranche 2 (LOV) has more than 25 years of experience in developing imaging sensors for the exploration of planktonic diversity and the estimation of the flux of carbon to the deep ocean, which both play a central role in the functioning of open-sea ecosystems and the regulation of climate. Following the current sensor generation (Underwater Vision Profiler (UVP), Picheral et al 2010), which has already acquired thousands of data profiles in all oceans of the globe 3 , LOV and its BRIDGES partners have developed a prototype for a new miniaturised version, called OCTOPUS and dedicated to gliders. OCTOPUS will detect, count and measure all objects larger than 100 microns and should be able to automatically classify larger objects (into various planktonic taxa and various shapes of dead particles) with a minimal increase in energy consumption. This automatic identification is the goal of this specific project.
Once objects are classified, this data will be sent to shore via satellite communication and this knowledge would give the opportunity to adapt the gliders’ navigation in order to comply with their predefined mission.
ENSTA-ParisTech (A. Manzanera) and LOV (M. Picheral, L. Stemmann, L. Guidi, and J.-O. Irisson) have partnered to build a reliable on-board classifier for zooplankton and particles seen on the images of the OCTOPUS sensor. For this task, they can rely on:
The last decade has seen the rise of machine learning methods for decision support and information retrieval in large data volumes. Successively, Support Vector Machines (SVM), Boosting, and Random Forest (RF) have built success stories. More recently, the consistent efforts in the field of artificial intelligence by the giants of the IT industry (Google, Microsoft, Facebook, etc.) have brought models of artificial neural networks called "deep architectures". These deep learning techniques have improved the state of the art in various fields such as speech recognition (Mohamed et al., 2012; Dahl et al., 2012; Jaitly et al., 2012), text mining (Collobert et al., 2011), and image classification (Krizhevsky et al., 2012).
For the specific task of plankton classification from images, the features extracted from the objects are often similar and pertain to colour, shape, etc. Then, based on these features, various classifiers have been tried (SVM: Hu and Davis, 2005; RF: Bell and Hopcroft, 2008; Bayesian approaches: Ye et al., 2011), with Random Forest being the most extensively used thanks to its performance and robustness (Grosjean et al., 2004). In 2015, a large-scale machine learning challenge was run on the Kaggle website to classify individual images into 120 planktonic taxa 5 . The results of the best ranked teams were all obtained by solutions based on deep learning methods, mainly by learning ensemble of deep architectures.
So deep learning seems to be the best current solution for image classification problems. However, its main drawback in the context of this project, where power consumption needs to be minimal, is the size of the models generated and the computational cost they incur. Training such models is very long, but will be done offline in the context of this project; however, even for predicting new data, the computation cost is not negligible. For example, a Convolutional Neural Network (CNN) proposed by Krizhevsky et al. (2012) for image classification in ImageNet (the largest image dataset available today) requires the calculation of 60 million parameters (with a training process that lasts one week on two GPUs). This combinatorial complexity restricts the application of such algorithms and their integration into more constrained environments (such as embedded platforms). The current solutions are based mainly on the elaboration of dedicated electronic chips, such as a FPGA with an ASIC, e.g. Farabet et al., 2009a; Farabet et al., 2009b; Pham et al., 2012, or specialised hardware components, such as the IBM TrueNorth neural net chip. This spiking-neuron-based chip can learn and run a CNN with a minimal loss of accuracy compared to traditionally learned CNNs, and with a tremendously low energy consumption (nVIDIA Jetson TX1 embedded GPU: 100 J/image; TrueNorth chip: 0.6e-6 J/image; Esser et al., 2015).
In this context, the objectives of the selected candidate will be:
The selected candidate will be based in Villefranche-sur-mer (Côte d’Azur, France) in the Laboratoire d'Océanographie de Villefranche 6 (LOV). This lab is located in a century old research institute, hosts 3 research teams comprising about 40 permanent researchers and 100 people in total, and is renowned worldwide for its work in the field of physical, biogeochemical, and ecological oceanography. To achieve this research, it fosters a highly interdisciplinary environment with many engineers developing new sensors and numerical methods to tackle new research questions.
The selected candidate will be under the direct supervision of M. Picheral, who is in charge of the development of the OCTOPUS sensor. Scientific supervision will be co-ensured by LOV, ENSTA and their partners.
An engineering degree or a PhD in science with strong knowledge of image processing and machine learning. Solid algorithmic skills with a taste for efficient implementations and embedded computing would be particularly appreciated.
Around 48 k€ yearly (to be adjusted depending on the candidate’s experience), for 12 to 18 months.
As soon as possible.
Send your CV, references, and a letter in support of application to firstname.lastname@example.org and email@example.com
(c) GdR 720 ISIS - CNRS - 2011-2015.