Les commentaires sont clos.

PhD at Mines Paris on Scene Understanding and Professional Action Representation and Recognition using Dynamic Neural Rendering through Egocentric Computer Vision

11 Juillet 2023

Catégorie : Doctorant


The Centre for Robotics of Mines Paris has a strong expertise on human actions representation, modelling and recognition applied to the Human-Machine Collaboration. The main objective of these projects is the development of novel methodologies and technological paradigms that improve the recognition of the human actions and gestures and allow for an improved computer-mediated interaction or a natural human-robot collaboration.


Topic :

In the framework of the Horizon Europe project Craeft and of the PIA4 project Re-SOuRCE, which aim at the modeling, representation and recognition of human actions (such as to add, subtract, interlock, or transform operations) and materials (such as surfaces, free-form or fibres) that are related with highly expert manual jobs, Mines Paris opens a PhD position on the Scene Understanding and Professional Action Representation and Recognition using Dynamic Neural Rendering through Egocentric Computer Vision.

We aim at the automatic decomposition of the 3D professional scene into a static background and dynamic foreground (moving objects including human actions and, tools and materials) using a freely-moving camera with an egocentric view. It is a high scientific challenge, in comparison to classic scene segmentation and understanding, because of the large apparent motion that is due to the camera’s big viewpoint change and parallax. The use of an extra objective camera point of view is also considered as well as third parties videos. The scene can be reconstructed through a multiple-stream neural rendering network, such as the Neural Radiance Fields (NeRFs) that effectively provide with a scene representation and rendering. NeRFs have been largely studied in [1, 2 and 3] to model the 3D geometry of static objects. As far as the dynamic scenes is concerned, most ot the approaches combine a canonical model with a deformation network or warp space, such as in [4 and 5]. Finally some more recent approaches combine a static with a dynamic NeRF via egocentric vision, such as in [6] as well as in [7] where the arms of the human are also considered. In the end, all the components of the scene should be automatically recognized, extending our previous work on the recognition of professional gestures and actions [8].

The PhD student should be able to analyse all the State-of-the-Art methodologies and propose meaningful updates of existing ones or completely new ones for the automatic decomposition of 3D scenes through egocentric videos by also considering human actions. The proposed method will be tested and evaluated within the context of the CRAEFT project, also considering high quality demonstrations.

This PhD will give the possibility to the student to work with other European researchers both in the project and in the wider academic community, as well as opportunities to work directly with industrial partners. Moreover, the student will acquire transferable skills that will enhance future employability through leading and contributing to highly interactive and collaborative work. Finally, the student will be autonomous and concentrated on his/her work and will contribute to the project management tasks, such as preparation of the project meetings (distance calls or physical meetings in different European countries – 3 per year), reports and deliverables.


Required skills:

Electrical or Computer Engineering, or Computer Science University Degree, or MSc in Applied Mathematics or Data Science or AI or similar with the above degrees. More precisely, the student should have very strong skills on:

• Machine and Deep Learning

• Computer Vision

• Programming: Python, C++, etc...

• The candidate must be proficient in both written and spoken English and possess excellent

presentation and communication skills which will be needed for regular interactions with the project partners.



The student will a have a 3-years contract with a gross monthly salary of 2233€ (complementary activities to research, such as teaching or providing reports and deliverables, etc., are included into the salary)


How to apply or for further information:

Please send your CV and cover letter to


For more information please visit the following links :


Bibliography :

[1] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proc. ECCV, 2020

[2] KarlStelzner, Kristian Kersting, and Adam R. Kosiorek. Decomposing 3d scenes into objects via unsupervised volume segmentation. arXiv.cs, abs/2104.01148, 2021.

[3] Christopher Xie, Keunhong Park, Ricardo Martin-Brualla, and Matthew Brown. Fig-nerf: Figure-ground neural radiance fields for 3d object category modelling. arXiv.cs, abs/2104.08418, 2021.

[4] Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proc. CVPR, 2021

[5] Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. In Proc. CVPR, 2021.

[6] Chen Gao, Ayush Saraf, Johannes Kopf, and Jia-Bin Huang. Dynamic view synthesis from dynamic monocular video. arXiv.cs, abs/2105.06468, 2021.

[7] Tschernezki, V., Larlus, D., & Vedaldi, A. (2021, December). NeuralDiff: Segmenting 3D objects that move in egocentric videos. In 2021 International Conference on 3D Vision (3DV) (pp. 910-919). IEEE.

[8] Papanagiotou, D., Senteri, G., & Manitsaris, S. (2021). Egocentric Gesture Recognition Using 3D Convolutional Neural Networks for the Spatiotemporal Adaptation of Collaborative Robots. Frontiers in Neurorobotics, 15.