Clasificación, detección y estimación de la pose de categorías de objetos
- Redondo Cabrera, Carolina
- Roberto Javier López Sastre Director
Defence university: Universidad de Alcalá
Fecha de defensa: 18 November 2015
- Saturnino Maldonado Bascón Chair
- Francisco Javier Acevedo Rodríguez Secretary
- Ana Cristina Murillo Arnal Committee member
- Antonio M. López Peña Committee member
- Antonio Sanz Montemayor Committee member
Type: Thesis
Abstract
This thesis focuses on the study of how to use the RGB-D information to address the problem of object recognition, detection and pose estimation. First, the thesis describes how to tackle the simultaneous object detection and pose estimation problem using Hough Forests (HF). A HF, like a Random Forest (RF), consists of multiple binary decision trees, which are trained independently and randomly. The proposed method introduces a new formulation for the regression to be performed with the HF, incorporating an uncertainty criterion for the continuous pose of the categories. This uncertainty in pose is decoupled from the traditional localization uncertainty, which allows us to randomly choose between them during the HF learning. Therefore the leaves of the HF contain pose and detection votes slightly noisy. The resulting HF can effectively locate objects and estimate their poses. However, the extension of the Hough space to cover also the pose regression turns out to be suboptimal. The main reason is that the pose voting is very noisy. This thesis proposes a novel approach to improve the pose estimation performance. First, the object is localized, and then its pose is estimated. For this second step, a novel regression strategy is introduced, named Probabilistic Locally Enhanced Voting (PLEV), which consists in modulating the regression with a kernel density estimation (KDE) to consolidate all the votes in a local Hough region near the maxima detected in the Hough space. The output of the model is in the form of a probability density function (PDF) for the pose estimation. This is especially useful when fusing information from multiple sources, or exploiting the temporal continuity in a video sequence to obtain more accurate pose estimations by fusing information from multiple frames. To further improve the detections, a novel pose-based backprojection (BP) strategy to boost the bounding box (BB) estimation using the pose cues is proposed. Essentially, the proposed method extends the traditional BP strategy. When computing the BP mask, the model penalizes patches that vote not only for different object locations, as in, but also for different poses