Carlier, Axel. Combining content analysis with usage analysis to better understand visual contents. PhD, Institut National Polytechnique de Toulouse, 2014
|
(Document in English)
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 6MB |
Official URL: http://ethesis.inp-toulouse.fr/archive/00002816/
Abstract
This thesis focuses on the problem of understanding visual contents, which can be images, videos or 3D contents. Understanding means that we aim at inferring semantic information about the visual content. The goal of our work is to study methods that combine two types of approaches: 1) automatic content analysis and 2) an analysis of how humans interact with the content (in other words, usage analysis). We start by reviewing the state of the art from both Computer Vision and Multimedia communities. Twenty years ago, the main approach was aiming at a fully automatic understanding of images. This approach today gives way to different forms of human intervention, whether it is through the constitution of annotated datasets, or by solving problems interactively (e.g. detection or segmentation), or by the implicit collection of information gathered from content usages. These different types of human intervention are at the heart of modern research questions: how to motivate human contributors? How to design interactive scenarii that will generate interactions that contribute to content understanding? How to check or ensure the quality of human contributions? How to aggregate human contributions? How to fuse inputs obtained from usage analysis with traditional outputs from content analysis? Our literature review addresses these questions and allows us to position the contributions of this thesis. In our first set of contributions we revisit the detection of important (or salient) regions through implicit feedback from users that either consume or produce visual contents. In 2D, we develop several interfaces of interactive video (e.g. zoomable video) in order to coordinate content analysis and usage analysis. We also generalize these results to 3D by introducing a new detector of salient regions that builds upon simultaneous video recordings of the same public artistic performance (dance show, chant, etc.) by multiple users. The second contribution of our work aims at a semantic understanding of fixed images. With this goal in mind, we use data gathered through a game, Ask’nSeek, that we created. Elementary interactions (such as clicks) together with textual input data from players are, as before, mixed with automatic analysis of images. In particular, we show the usefulness of interactions that help revealing spatial relations between different objects in a scene. After studying the problem of detecting objects on a scene, we also adress the more ambitious problem of segmentation.
Item Type: | PhD Thesis |
---|---|
Uncontrolled Keywords: | |
Institution: | Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE) |
Laboratory name: | |
Research Director: | Charvillat, Vincent and Morin, Géraldine |
Statistics: | download |
Deposited On: | 18 Dec 2014 23:00 |
Repository Staff Only: item control page