Vous êtes ici : Accueil » Kiosque » Annonce

Identification

Identifiant: 
Mot de passe : 

Mot de passe oublié ?
Détails d'identification oubliés ?

Annonce

21 septembre 2018

Postdoc Position in “Deep neural networks for multimodality”, at University of Caen, France


Catégorie : Post-doctorant


Postdoc Position in “Deep neural networks for multimodality”, at University of Caen, France

The Computer Vision group of the University of Caen (France) is opening a PhD position starting as soon as possible.

Topic: A strong limitation of multimodal data collections is that they require annotations for training that can be highly time consuming to produce. In addition, it is generally observed that the annotations (and even the data) for the different modalities are unbalanced ; ones are abundant whereas the others can be much scarcer. As an illustration, the work of [6] demonstrates how visual model for visual tasks can be trained using ambient sounds as supervisory signals. In [9] we have proposed a central architecture for multimodal fusion, aiming to produce best possible decisions by integrating information coming from multiple media.

 

Postdoc Position in “Deep neural networks for multimodality”, at University of Caen, France

The Computer Vision group of the University of Caen (France) is opening a PhD position starting as soon as possible.

Topic: A strong limitation of multimodal data collections is that they require annotations for training that can be highly time consuming to produce. In addition, it is generally observed that the annotations (and even the data) for the different modalities are unbalanced ; ones are abundant whereas the others can be much scarcer. As an illustration, the work of [6] demonstrates how visual model for visual tasks can be trained using ambient sounds as supervisory signals. In [9] we have proposed a central architecture for multimodal fusion, aiming to produce best possible decisions by integrating information coming from multiple media.

In this project, different options will be considered to address this question such as the use of Multi-Task learning or the use of (per modality) autoencoders sharing a common central layer. We will also investigate, as an alternative, the use of Deep Boltzmann Machines and maximum joint likelihood learning to improve their performance. Some key references that will be considered as inspiring works are listed below.

Start date: ASAP.

Location: The position will be located at the Computer Vision group at the University of Caen.

Contacts: Prof. Frederic Jurie (frederic.jurie@unicaen.fr)

Profile:
* Recent PhD in Computer Science, computer vision, machine learning or Applied Mathematics
* Solid programming skills; the project involves programming in Python and Tensorflow
* Solid mathematics knowledge (especially linear algebra and statistics)
* Creative and highly motivated
* Fluent in English, both written and spoken
Prior knowledge in the areas of computer vision (in particular face recognition or visual attributes), machine learning or data mining is a plus

Please send applications via email, including:
* a complete CV including a list of publications
* the name and email address of three references

Applications should be sent to Frederic Jurie (frederic.jurie@unicaen.fr). Applicants can be asked to do a short assignment in order to demonstrate their research abilities.
References
—————————-

[1] Aytar, Y., Vondrick, C., Torralba, A., 2017. See, Hear, and Read: Deep Aligned Representations. arXiv:1706.00932 [cs].

[2] Aytar, Y., Vondrick, C., Torralba, A., 2016. SoundNet: Learning Sound Representations from Unlabeled Video. arXiv:1610.09001 [cs].

[3] Caron, M., Bojanowski, P., Joulin, A., Douze, M., 2018. Deep Clustering for Unsupervised Learning of Visual Features. arXiv:1807.05520 [cs].

[4] Doersch, C., Zisserman, A., 2017. Multi-task Self-Supervised Visual Learning. arXiv:
1708.07860 [cs].

[5] Hu, D., Nie, F., Li, X., 2018. Deep Co-Clustering for Unsupervised Audiovisual Learning. arXiv:1807.03094 [cs, eess].

[6] Owens, A., Wu, J., McDermott, J.H., Freeman, W.T., Torralba, A., 2016. Ambient Sound
Provides Supervision for Visual Learning. arXiv:1608.07017 [cs].

[7] Surís, D., Duarte, A., Salvador, A., Torres, J., Giró-i-Nieto, X., 2018. Cross-modal Embeddings for Video and Audio Retrieval. arXiv:1801.02200 [cs, eess].

[8] Takahashi, N., Gygli, M., Van Gool, L., 2017. AENet: Learning Deep Audio Features for Video Analysis. arXiv:1701.00599 [cs].

[9] Valentin Vielzeuf; Alexis Lechervy; Stephane Pateux; Frederic Jurie, CentralNet: a Novel Multilayer Approach for Multimodal Fusion - Valentin Vielzeuf; Alexis Lechervy; Stephane Pateux; Frederic Jurie, ECCV 2018 Workshop on Multimodal Learning and Applications.

 

Dans cette rubrique

(c) GdR 720 ISIS - CNRS - 2011-2018.