Visual scene understanding for autonomous vehiclesunderstanding where and what

  1. Ros Sánchez, Germán
Dirigida por:
  1. Antonio M. López Peña Codirector/a
  2. Julio Guerrero García Codirector
  3. Ángel Domingo Sappa Codirector/a

Universidad de defensa: Universitat Autònoma de Barcelona

Fecha de defensa: 24 de octubre de 2016

Tribunal:
  1. Mathieu Salzmann Presidente/a
  2. Joost van de Weijer Secretario/a
  3. Pedro Pinies Rodriguez Vocal

Tipo: Tesis

Teseo: 435648 DIALNET lock_openTESEO editor

Resumen

Making Ground Autonomous Vehicles (GAVs) a reality as a service for the society is one of the major scientific and technological challenges of this century. The potential benefits of autonomous vehicles include reducing accidents, improving traffic congestion and better usage of road infrastructures, among others. These vehicles must operate in our cities, towns and highways, dealing with many different types of situations while respecting traffic rules and protecting human lives. GAVs are ex- pected to deal with all types of scenarios and situations, coping with an uncertain and chaotic world. Therefore, in order to fulfil these demanding requirements GAVs need to be endowed with the capabil- ity of understanding their surrounding at many different levels, by means of affordable sensors and artificial intelligence. This capacity to understand the surroundings and the current situation that the vehicle is involved in, is called scene understanding. In this work we investigate novel techniques to bring scene understanding to autonomous vehicles by combining the use of cameras as the main source of information—due to their versatility and affordability—and algorithms based on computer vision and machine learning. We investigate different degrees of understanding of the scene, starting from basic geometric knowledge about where is the vehicle within the scene. A robust and efficient estimation of the vehicle location and pose with respect to a map is one of the most fundamental steps towards autonomous driving. We study this problem from the point of view of robustness and computational efficiency, proposing key insights to improve current solutions. Then we advance to higher levels of abstraction to discover what is in the scene, by recognizing and parsing all the elements present on a driving scene, such as roads, sidewalks, pedestrians, etc. We investigate this problem known as semantic segmentation, proposing new approaches to improve recognition accuracy and computational efficiency. We cover these points by focusing on key aspects such as: (i) how to leverage computation moving semantics to an offline process, (ii) how to train compact architectures based on deconvolutional networks to achieve their maximum potential, (iii) how to use virtual worlds in combination with domain adaptation to produce accurate models in a cost-effective fashion, and (iv) how to use transfer learning techniques to prepare models to new situations. We finally extend the previous level of knowledge enabling systems to reasoning about what has change in a scene with respect to a previous visit, which in return allows for efficient and cost-effective map updating.