Optimal signal representations for music source separation
Team / Laboratory: SIMOB - IBISC (EA 4526) - Univ. Paris-Saclay (UEVE)
Salary an perspectives: According to background and experience (a minimum of 577.50 euros/month). Possibility to pursue with a 3-year-funded PhD contract with international research partners.
Abstract: The success of artificial intelligence (AI) generally depends on data representation of the input. This is probably because different representations can disentangle or hide the useful information exploited for a given task. Here, we propose to target the source separation problem which aims at recovering the original signals (or sources) which compose an observed mixture. State-of-the-art audio methods empirically compute a time-frequency representation of the mixture. However, experiments presented in  show that is
not often the best choice to efficiently segregate the sources present in the mixture. Hence, this internship focuses on optimizing the signal transformation to obtain the best separation results. The main motivation is to improve the results of a state-of-the-art source separation method and to better understand how to compute optimal data representations for a given task.
Goals: In the field of music processing, the sources often correspond to the different instrumental parts (i.e. voice, guitar, piano, etc.) used to create the mix. Obtaining the source is of interest for many tasks such as creating new artistic mixes, improving the audio quality or applying effects such as karaoke. Starting with a state-of-the-art system that estimates the sources from an input mixture, this internship investigates the role of the input data representation on the resulting source separation quality.
 Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning : A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8) pp. 1798-1828, 2013.
 Rachel M Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello. Deep salience representations for f0 estimation in polyphonic music. In Proc. ISMIR, pp 63-70, 2017.
 Dominique Fourer and Georoy Peeters. Fast and adaptive blind audio source separation using recursive levenberg-marquardt synchrosqueezing. In Proc. IEEE ICASSP, Calgary, Canada, April 2018.
 Aaron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. Wavenet : A generative model for raw audio. In SSW, 2016.
 E. Vincent, R. Gribonval, and C. Fevotte. Performance measurement in blind audio source separation. EEE/ACM Transactions on Audio, Speech, and Language Processing, 14(4) pp. 1462-1469, July 2006.
 O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52(7), pp. 1830-1847, 2004.
(c) GdR 720 ISIS - CNRS - 2011-2018.