Les commentaires sont clos.

Traitement du signal pour la musique (Action Audio)

Date : 16-11-2023
Lieu : IRCAM, Centre George Pompidou, Paris

Thèmes scientifiques :
  • A - Méthodes et modèles en traitement de signal

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

S'inscrire à la réunion.


10 personnes membres du GdR ISIS, et 14 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 70 personnes.


Dans le cadre de l'action «Traitement du signal pour l'audio et l'écoute artificielle» du Gdr Isis, nous organisons, le Jeudi 16 Novembre 2023 à l'IRCAM, une troisième journée dédiée «Musique», animée par les orateurs suivants :

  • Gaël Richard
  • Rémi Mignot
  • Rachel Bittner
  • Romain Hennequin


Nous invitons tout participant souhaitant présenter ses travaux relevant de l'audio de contacter Vincent Lostanlen (vincent dot lostanlen at ls2n dot fr) avant 15 Octobre. Leur présentation se fera avec le format suivant : une brève présentation de 3 minutes en plénière et un poster affiché durant la journée. De nombreuses places sont disponibles, n'hésitez pas à présenter des travaux partiels ou déjà publiés, supports de discussion et d'échanges.

Comité d'organisation

  • Mathieu Lagrange (LS2N, CNRS)
  • Thomas Hélie (STMS, Ircam, CNRS)
  • Vincent Lostanlen (LS2N, CNRS)


09:30 Accueil (Café)

10:00 Introduction

10:15 Gaël Richard : "Hybrid deep learning for music analysis and synthesis"

11:15 Rémi Mignot : "Invariance learning for a music indexing robust to sound modifications"

12:15 Repas

14:00 Rachel Bittner : "Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music"

15:00 Romain Hennequin : "Labeling a Large Music Catalog"

16:00 Présentation brève des posters en salle

16:30 Posters et café

17:30 Clôture

Résumés des contributions

Gaël Richard : "Hybrid deep learning for music analysis and synthesis"

The access to ever-increasing super-computing facilities, combined with the availability of huge data repositories (although largely unannotated), has permitted the emergence of a significant trend with pure data-driven deep learning approaches. However, these methods only loosely take into account the nature and structure of the processed data. We believe that it is important to rather build hybrid deep learning methods by integrating our prior knowledge about the nature of the processed data, their generation process or if possible their perception by humans. We will illustrate the potential of such model-based deep learning approaches (or hybrid deep learning) for music analysis and synthesis.

Rémi Mignot : "Invariance learning for a music indexing robust to sound modifications"

Music indexing allows the finding of music excerpts among a large music catalog and the detection of duplicates. With the rise of social media, it is more and more important for music owners to detect misuse and illegal use of their music. The main difficulty of this task is the detection of music excerpts when they have been strongly modified, intentionnally or not. To deal with this issue, the presented method is based on an audio representation relevant to the music content and robust to some sound modifications. Then, using a data augmentation approach, a discriminant transformation is learnt to improve the robustness of the compact representation. Finally, a hash function is derived to allow a fast searching with a large catalog together with a robustness to bit corruption.

Rachel Bittner : "Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music"

"Basic-pitch" is a lightweight neural network for musical instrument transcription, which supports polyphonic outputs and generalizes to a wide variety of instruments (including vocals). In this talk, we will discuss how we built and evaluated this efficient and simple model, which experimentally showed to be substantially better than a comparable baseline in detecting notes. The model is trained to jointly predict frame-wise onsets, multi-pitch and note activations, and we experimentally showed that this multi-output structure improves the resulting frame-level note accuracy. We will also listen to examples using (and misusing) this model for creative purposes, using our open-source python library, or demo website: thanks to its scalability, the model can run on the browser, and your audio doesn't even leave your own computer.

Paper: https://arxiv.org/abs/2203.09893
Code: https://github.com/spotify/basic-pitch
Demo: https://basicpitch.spotify.com

Romain Hennequin : "Labeling a Large Music Catalog"

Music Streaming Services such as Deezer offer their users a catalog of tens of millions of songs. Navigating through such a vast catalog requires retrieving and organizing musical knowledge in an automated way using music information retrieval tools. Various music dimensions can be considered, such as music genre or moods, and all these dimensions come with ambiguities. The talk will describe common issues with labeling large music catalogs, how to deal with them, and the remaining challenges.


Gaël Richard is a Full Professor in audio signal processing at Telecom-Paris, Institut polytechnique de Paris and the scientific co-director of the Hi! PARIS interdisciplinary center on Artificial Intelligence and Data analytics. His research interests are mainly in the field of speech and audio signal processing and include topics such as signal models, source separation, machine learning methods for audio/music signals and music information retrieval. He received, in 2020, the Grand prize of IMT-National academy of science for his research contribution in sciences and technologies. In 2022, he is awarded of an advanced ERC grant of the European Union for a project on hybrid deep learning for audio (HI-Audio). He is a fellow member of IEEE.

Rémi Mignot is a tenured researcher at IRCAM (UMR STMS 9912) in Paris, France, member of the Analysis/Synthesis team. His research expertise focuses on machine learning and signal processing applied to audio processing and indexing. He received a PhD in Signal and Image Processing of Télécom ParisTech with IRCAM in 2009. Then, he did a first post-doctoral research in the Langevin Institut (ESPCI ParisTech and UPMC in Paris), where he studied the sampling of rooms impulses responses using Compressed Sensing; and a second post-doctoral research at Aalto University in Espoo, Finland, with a Marie Curie post-doctoral fellowship, where he worked on the sound synthesis of musical instruments based on an "Extended" Subtractive Synthesis approach. In 2014 he came back at IRCAM to work on audio indexing and music information retrieval, and he has obtained a permanent position since 2018.

Rachel Bittner is a Research Manager at Spotify in Paris. Before Spotify, she worked at NASA Ames Research Center in the Human Factors division. She received her Ph.D. degree in music technology and digital signal processing from New York University. Before that, she did a Master's degree in Mathematics at New York University, and a joint Bachelor's degree in Music Performance and Math at UC Irvine. Her research interests include automatic music transcription, musical source separation, metrics, and dataset creation.

Romain Hennequin is a Research Scientist at Deezer, where he heads the researchers' team. He graduated in Computer Science from Ecole Polytechnique, UPMC (now Sorbonne Université), and Telecom Paris and earned a Ph.D. in signal processing from Telecom Paris. He has been working for more than 10 years in industrial research, addressing various topics such as source separation, music

information retrieval, recommender systems, and graph mining.