Open PostDoc @GIPSA-Lab at http://www.gipsa-lab.fr/projet/SOMBRERO ANR Project. Detailed position at http://intranet.gipsa-lab.grenoble-inp.fr/transfert/propositions/2_2017-11-21_postdoc_SOMBRERO_GIPSA_v2.pdf on active sound source localization for social robotics.
Sound source localization by a humanoid robot
A social robot - i.e. having the ability to interact with human partners - must be able to perceive and analyze the surrounding audiovisual scene in order to contextualize his actions. This includes being able to locate sound sources in its immediate environment in order to focus his attention to a relevant source, e.g. an alarm or a potential partner. Sound source localization (for both Humans and machines) is based on the links that exist between some acoustic characteristics of the perceived sounds and the relative position of the emitting source to the sensors. For binaural set-ups (i.e. two in-ears microphones), features such as delay and difference of intensity between the signals arriving at the two ears are widely used. In addition to the position of the source, these binaural features also depend on the geometry of the sensors (including the head) and on the environment acoustics (e.g. room reverberations). State-of-the-Art methods for automatic sound source localization thus rely on a statistical mapping between binaural features and source position . The mapping tools are advanced regression methods which parameters are determined during a supervised training phase using controlled training data , for instance with known source position.
We propose here to exploit the motor skills of the robot for autonomously collecting training data. For this aim, we have developed a specific device (a loudspeaker fixated at the end of a rod) which the robot will manipulate with its arm. The robot will thus be able to autonomously move an acoustic source around its body, explore and learn its auditory-motor relations. The position of the source in a polar egocentric coordinate system will be provided by video processing using the cameras on the robot head. The humanoid robot that will be equipped with this sensorimotor training capacity is Nina, an iCub designed by IIT Genoa with a fully articulated face (ears, jaw and lips) and high quality ear microphones.
The proposed work consists in three main tasks:
1.Data collection. We will develop a system for autonomously collecting auditory-motor features. The robot will spend hours moving the source with its arm in many various positions, detecting the source position with its cameras, moving its head in many various positions, and for each pair of source/head positions playing sounds and collecting cues.
2.Modeling audio-motor maps. We will assess various (machine learning based) regression modelsto link auditory perception with source position, e.g. Gaussian Mixture Regression (GMR) and Deep Neural Networks (DNN). We will explore discriminative learning solutions to automatically select the audio features that are less sensitive to noise.
3.Active perception. We will explore strategies to insert the mapping model into multi-source tracking and identification .
The position is granted for 12 months by the SOMBRERO project. Salary based on candidate's experience and profile according to the CNRS salary scale (i.e. 30k€/year with experience
T. May, S. van de Par, and A. Kohlrausch, “A probabilistic model for robust localization based on a binaural auditory front-end,” IEEE Transactions on audio, speech, and language processing, vol. 19, no. 1, pp. 1–13, 2011.
A. Deleforge, R. Horaud, Y. Y. Schechner, and L. Girin, “Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 23, no. 4, pp. 718–731, 2015.
J. Ferreira, J. Lobo, P. Bessiere, M. Castelo-Branco, and J. Dias, “A Bayesian framework for active artificial perception,” IEEE transactions on cybernetics, vol. 43, no. 2, pp. 699–711, 2013.
X. Li, L. Girin, R. Horaud, and S. Gannot, “Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 10, pp. 1997–2012, 2017.
(c) GdR 720 ISIS - CNRS - 2011-2018.