Machines can now watch and interpret images, recognize speech and music genres, yet they are
hardly capable of understanding ambient audio scenes, e.g., the sounds occuring in a kitchen in
the morning or the sounds occuring on a road nearby a vehicle. Today’s research dealing with
audio scene understanding is mostly limited to the detection and recognition of audio events and
audio context classes. While such tasks are useful, the ultimate goal of audio scene understanding
goes far beyond the assignment of labels to audio. Instead, it is to develop machines that fully
understand audio input. The funded ANR project "LEAUDs", will make a leap towards this goal by
achieving breakthroughs in three intertwined directions that are essential for next-generation
audio understanding systems: detection of thousands of audio events from little annotated data,
robustness to “out-of-the lab” applications, and language-based description of audio scenes.
Accordingly, LEAUDS will develop machine learning algorithms able to learn from few weakly-labeled audio recordings and to discover novel audio events. Robustness to real-world conditions is a key challenge for applications beyond academic laboratories.
The project LEAUDs is in collaboration with INRIA Nancy Multispeech team.
In this training, the objective is to
* evaluate the state-of-the art in weakly-supervised learnings (see Dcase)
* implement some baselines in WSL for audio scene analysis
* improving on these models by incorporating in the learning
procedure, constraints such as absence of classes, minimum
lengths of one, ...
* studying models able to handle strongly labelled and weakly labelled data
Candidates are expected to be a Master student with background in statistics/machine learning and
have experiences with programming in Python. Experience with deep learning framework Pytorch is a plus.
* location : LITIS Lab, Université de Rouen 76800 Saint Etienne du Rouvray
research visit and collaboration with INRIA Nancy are expected
* date : 5 or 6 months starting from february or march 2019
* apply by sending CV and letter to alain.rakotomamonjy @ univ-rouen.fr
* supervisor : Alain Rakotomamonjy and Maxime Bérar
* If mutual fit, there is possibility to pursue as a PhD student in Rouen
with frequent stays at INRIA Nancy
Serizel, Romain and Turpault, Nicolas and Eghbal-Zadeh, Hamid and Parag Shah, Ankit,
"Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments"
To appear in the Dcase workshop 2018
E. Cakir and T. Virtanen. End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input. In proc. International Joint Conference on Neural Networks, 2018.
S. Adavanne, A. Politis, T. Virtanen. Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features. In proc. International Joint Conference on Neural Networks, 2018.
A. Rakotomamonjy, Supervised Representation Learning for Audio Scene Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol 25, N°6, pp 1253-1265, 2017.