# Annonce

### PhD on Multi-Resolution Neural Networks (MuReNN)

17 Juillet 2023

Catégorie : Doctorant

The French national center for scientific research (CNRS) is hiring a PhD student as part of a three-year project on “Multi-Resolution Neural Networks” (MuReNN). MuReNN is supported by the French national funding agency (ANR), and hosted at the Laboratoire des Sciences du Numérique de Nantes (LS2N). A collaboration with the Austrian Academy of Sciences is planned.

If interested, please contact the principal investigator at: vincent.lostanlen@cnrs.fr

Visit: https://audio.ls2n.fr/2023/07/11/phd-offer-theory-and-implementation-of-multi-resolution-neural-networks/

MuReNN aims to overcome three challenges:

1. The limited understanding of learning dynamics in large neural audio models when applied to new problems with limited domain-specific knowledge;

2. the need for massive expert annotation in supervised deep learning, which is costly and time-consuming; and

3. the environmental footprint of AI-enabled digital audio devices.

MuReNN pursues three objectives:

1. To build a cogent and rigorous theory of large convolutional operators, at the interaction between Fourier theory, frame theory, and statistical machine learning;

2. to train neural networks that generalize from limited labeled raw audio data so as to be able to target a wide range of “niche” applications; and

3. to embed learned raw audio models on sustainable hardware, which implies strong algorithmic constraints in terms of number of operations and memory usage.

Given these objectives, our research hypothesis is that learnable audio frontends need multi-resolution approximations (MRA) to be frugal, both in terms of material resources and human annotation workload. More precisely, we posit that a 1-D convolutional network (convnet) may be “compressed” into a smaller model, named MuReNN. The key idea behind MuReNN is to convolve learnable operators (FIR filters) with non-learnable ones (wavelets) so as to expand the receptive field of the learned representation without aliasing artifacts.

From a methodological standpoint, MuReNN aims to invent a “best-of-both-worlds” between time-frequency analysis and deep neural networks: i.e., mathematics-driven like the former and data-driven like the latter. Indeed, MuReNN can be interpreted in two ways: as a scattering transform whose wavelets are learnable or as a convnet whose kernels are wavelet-like.

From an application standpoint, MuReNN aims to contribute to the ongoing diversification of machine listening systems: i.e., not just speech and music, but also sounds from cities, wildlife, factories, and the human body. The “flagship” application of MuReNN is the sustainable design of a solar-powered acoustic sensor network for biodiversity monitoring in an offshore wind farm.

**Context**

MuReNN aims to overcome three challenges:

1. The limited understanding of learning dynamics in large neural audio models when applied to new problems with limited domain-specific knowledge;

2. the need for massive expert annotation in supervised deep learning, which is costly and time-consuming; and

3. the environmental footprint of AI-enabled digital audio devices.

MuReNN pursues three objectives:

1. To build a cogent and rigorous theory of large convolutional operators, at the interaction between Fourier theory, frame theory, and statistical machine learning;

2. to train neural networks that generalize from limited labeled raw audio data so as to be able to target a wide range of “niche” applications; and

3. to embed learned raw audio models on sustainable hardware, which implies strong algorithmic constraints in terms of number of operations and memory usage.

Given these objectives, our research hypothesis is that learnable audio frontends need multi-resolution approximations (MRA) to be frugal, both in terms of material resources and human annotation workload. More precisely, we posit that a 1-D convolutional network (convnet) may be “compressed” into a smaller model, named MuReNN. The key idea behind MuReNN is to convolve learnable operators (FIR filters) with non-learnable ones (wavelets) so as to expand the receptive field of the learned representation without aliasing artifacts.

From a methodological standpoint, MuReNN aims to invent a “best-of-both-worlds” between time-frequency analysis and deep neural networks: i.e., mathematics-driven like the former and data-driven like the latter. Indeed, MuReNN can be interpreted in two ways: as a scattering transform whose wavelets are learnable or as a convnet whose kernels are wavelet-like.

From an application standpoint, MuReNN aims to contribute to the ongoing diversification of machine listening systems: i.e., not just speech and music, but also sounds from cities, wildlife, factories, and the human body. The “flagship” application of MuReNN is the sustainable design of a solar-powered acoustic sensor network for biodiversity monitoring in an offshore wind farm.

**PhD Topic**

The PhD topic is “theory and implementation of multi-resolution neural networks” (MuReNN). On the theoretical side, the PhD student will characterize the probability distribution of where is a convolutional neural network (convnet) layer with random Gaussian weights and is a deterministic audio signal. For this purpose, the PhD student will reuse fundamental elements of probability theory and specialize them for the problem at hand. Then, the student will extend their discussion to the case of MuReNN with random Gaussian weights. The motivation behind this study is to show that a random MuReNN has a more stable inverse than a random convnet.

A second direction of theoretical analysis will be the characterization of first- and second-order derivatives of MuReNN with respect to the audio input: i.e., its Jacobian matrix and Hessian tensor. Here, the goal will be to show that optimizing a large MuReNN with stochastic gradient descent is likely to yield a lower training error than a convnet, since local minima are typically fewer and closer to the global minimum. The PhD student will examine the recent scientific literature on mathematical aspect of deep learning and produce a compared analysis between single-resolution and multi-resolution optimization.

On the experimental side, the PhD student will develop and maintain a software package for implementing MuReNN in PyTorch. The package will be based on pytorch_wavelets and thus will enable both GPU acceleration and automatic differentiation. The first objective will be to train MuReNN on well-controlled, large-data problems such as fundamental frequency estimation and show that results are on par with convnets with a gain in algorithm efficiency. The next goal will be to evaluate MuReNN on challenging settings with limited labeled data: e.g., SONYC-UST-v2 (urban sounds), MIMII (industrial noise), and BEANS (animal sounds).

The PhD student will team up with other members of the MuReNN consortium towards the design of a programmable autonomous sensor for bioacoustic monitoring, which will run a fixed-point implementation of MuReNN and will perform audio classification in real time. For this work package, the role of the PhD student within the team will be to devise a suitable parametrization of MuReNN for the problem at hand and train the model on GPU hardware.

The PhD will conclude with a participation to a team-wide paper on the topic of sustainable biodiversity monitoring at an offshore wind farm with MuReNN-enabled sensors.

**Advisorship**

Vincent Lostanlen is the PI of MuReNN and the main point of contact for this PhD. He is a CNRS scientist at LS2N. He obtained his PhD in 2017 at École normale supérieure under the supervision of Stéphane Mallat. Before joining CNRS, he was a postdoctoral researcher at the Cornell Lab of Ornithology and a visiting scholar at New York University’s Music and Audio Research Lab (NYU).

Mathieu Lagrange will be the official advisor (HDR, i.e., *habilité à diriger des recherches*) for this PhD. He is a CNRS scientist at LS2N, in the same team as Vincent Lostanlen. He obtained his PhD at the University of Bordeaux in 2004. Before joining CNRS, he was a scientist in Canada (University of Victoria, McGill University) and in France (Télécom Paris, Ircam).

Peter Balazs will welcome the PhD student at the Acoustics Research Institute (ARI) of the Austrian Academy of Sciences, for a research visit about the mathematics of deep convolutional networks. He obtained his PhD in 2005 under the supervision of Hans Feichtinger and Bruno Torrésani. He is the recipient of the Start-Preis in 2011 and the director of ARI.

Other permanent researchers in the MuReNN consortium are Anastasia Volkova (Inria Lyon) and Florent de Dinechin (CITI Lyon). Beyond the MuReNN consortium, the PhD student will have the opportunity to interact with the BioacAI doctoral network (“bioacoustics AI for wildlife conservation”), to which Vincent Lostanlen belongs.

**Candidate background**

- A MSc degree in computer science, mathematics, or electrical engineering is required.
- Fluency in English is required. Knowledge of French is useful but not required.
- The candidate should have followed a signal processing course. Knowledge of mathematical operations such as convolution, discrete Fourier transform, subsampling, is required. Knowledge of wavelet theory is useful but not required.
- The candidate should have followed a probability theory course. Knowledge of Gaussian vectors, the central limit theorem, and Markov’s inequality is required. Knowledge of learning theory and random matrix theory is useful but not required.
- The candidate should have some experience with analyzing real-world data, preferably audio. Experience with deep neural networks is useful but not required.
- The candidate should know how to program in Python, use a command-line interface, and use version control (git). Self-taught programmers are welcome. Knowledge of PyTorch is useful but not required.