Alexandru Ginsca: Leveraging large scale Web data for image retrieval and user credibility estimation
17 Novembre 2015
Catégorie : Soutenance de thèse
Vous êtes cordialement invité à la soutenance de thèse d'Alexandru-Lucian Ginsca, intitulée "Leveraging large scale Web data for image retrieval and user credibility estimation" qui aura lieu le lundi 30 novembre 2015 à 14h à Télécom Paris (46 rue Barrault, Paris) dans l'amphi "Rubis".
Vous trouverez des informations d'accès à Télécom Paris à l'adresse suivante : http://www.telecom-paristech.fr/telecom-paristech/adresses-acces-contacts/acces-rue-barrault.html
Contact : email@example.com
- Mme. Céline Hudelot, MCF-HDR - Ecole Centrale Paris (rapporteur)
- M. Stéphane Marchand-Maillet, PR - Université de Génève (rapporteur)
- M. Pierre François Marteau, PR - Université de Bretagne Sud
- M. Ioannis Kanellos, PR - IMT Télécom Bretagne (directeur de thèse)
- M. Adrian Popescu, Dr - CEA (encadrant)
While research in visual and multimedia recognition and retrieval has significantly benefited from manually labeled datasets, the availability of such resources remains a serious issue. Manual annotation is still a cumbersome task, especially when it is conducted on large datasets. A promising way to circumvent the lack of annotated data is to use images shared on multimedia social networks, such as Flickr. One of the main drawbacks of user-contributed collections is that a part of images annotations is not directly related to the visual content, rendering them less useful for image mining.
The work presented in this Thesis is placed at the crossroads between the use of Web data in image mining and source credibility in image sharing platforms. It aims at bringing novel findings to both domains and furnishing a promising link between two separate fields of research. The theoretical frameworks and experimental results we detail can benefit both i) researchers coming from the multimedia mining community, by introducing efficient semantic image representations built from freely available image resources and ii) researchers interested in Web data quality and source credibility, by proposing a study of credibility in the multimedia domain and testing practical applications of user credibility estimates. We propose a scalable image classification framework that exploits binary linear classifiers.
To implement this framework, we compare two data sources: a large manually annotated image dataset (i.e. ImageNet) and Flickr groups. For the second, we details methods that reduce the noise inherent to a Web collection. In an extended experimental section, we show that the proposed semantic features not only improve the retrieval performance on three well known image collections (ImageCLEF Wikipedia Retrieval 2010 Collection, MIRFLICKR, NUS-WIDE), when compared to state of the art image descriptors, but also offer a significant improvement of retrieval time. We then define the concept of user tagging credibility and apply it to Flickr users. We propose 66 features that can serve as estimators for user credibility. We introduce both context and content based features extracted from various Flickr data. We evaluate the proposed features both on a publicly available dataset and new dataset, which we introduce in this Thesis. Finally, we showcase the use of credibility estimates in two application scenarios: embedding them in an image diversification pipeline and using them as features in machine learning models for expertise classification and expert retrieval tasks.
This work contributes to a better understanding and modeling of social intelligence for information processing tasks. We focused on image retrieval and multimedia credibility estimation but the methods proposed here are also relevant for other applications, such as image annotation and Web data quality control.