Real time sequential non rigid structure from motion using a single camera
- BRONTE PALACIOS, SEBASTIÁN
- Luis M. Bergasa Pascual Director
- Daniel Pizarro Pérez Co-director
Defence university: Universidad de Alcalá
Fecha de defensa: 11 July 2017
- Arturo de la Escalera Hueso Chair
- Marta Marrón Romera Secretary
- Pablo Mesejo Santiago Committee member
Type: Thesis
Abstract
There are applications based in a correct localization and reconstruction of a scene in a real 3D environment, which has experienced a great interest in the latest years by researchers and industrial community. These applications cover from augmented reality, robotics, simulation, video-games, etc. Depending on the application and the required reconstruction detail level, different devices can be used such as: stereo cameras, Red Green Blue and Depth (RGBD) sensors using Structured Light, Time of Flight (ToF) cameras, 2D / 3D lasers, etc. Simpler applications can use less complex hardware, i.e. commonly use devices, like smartphones, and applying computer vision techniques, 3D models of the workspace can be obtained with quality enough to render augmented information. In robotics, localization and simultaneous 3D map generation using a camera is a fundamental task for autonomous navigation. To that end, Simultaneous Localization And Mapping (SLAM) or Structure from Motion (SfM) techniques have been used. The condition for applying these techniques is the target object must not change its shape along the time, so it must be rigid. In this case, the reconstruction is unique up to scale, given that for a monocular capture is not possible to recover it unless there is a fixed reference. In case the rigidity condition does not apply on the scene, the object changes its shape along the time, so it is deformable. Therefore the problem would be equivalent to perform a reconstruction per frame, which is an ill posed problem and so ambiguous, as different shapes combined with certain camera poses could lead to similar projections. This is why deformable object reconstruction is an active research field nowadays. To perform the reconstructions, SfM methods have been adapting to the non-rigid reconstruction of deformable objects by incorporating physical models, temporal, spacial and geometrical priors or other kinds of restrictions to reduce the solutions and better conform the reconstruction, giving as a result the Non-Rigid Structure from Motion (NRSfM) techniques. In this Thesis, we propose to depart from a well known state-of-the-art technique PTAM (Parallel Tracking and Mapping) and adapt it to include NRSfM techniques, based on linear bases model to estimate the object deformations dynamically and apply temporal and spacial restrictions to improve the reconstruction. Additionally it is modified to adapt to changes on the deformation types of the sequence. To that end, there has been changes to be applied to each of PTAM execution thread to process the incoming non-rigid data of the scene in a natural way. Data association problems are faced as well. The tracking thread was already doing tracking from template in a native way, based on 3D map points, previously provided. The main modification proposal of this thread is the integration of a linear shape bases model to perform the computation of the shape deformations in real time assuming the deformation bases fixed. The pose computation is based on the previous rigid estimation system, so the whole state estimation is done alternating pose and deformation coefficient steps by using an Expectation-Maximization (EM) algorithm. Temporal and shape smoothness priors are also imposed to minimize the ambiguities inherent to the solutions and to improve the 3D estimations quality. Regarding the mapping thread, it is modified so that it can handle deformation bases improvements when the current set of bases are not able to explain the currently seen deformations on the image. To that end, the rigid optimization technique of Sparse Bundle Adjustment (SBA) is substituted by an exhaustive non-rigid NRSfM batch algorithm to improve the bases according to the images having a great reprojection error that are sampled from the tracking thread. With this setup, we are able to adapt to new deformations in a sequential way, allowing the system to evolve and being stable in the long term. Unlike some literature methods, the proposed system faces the perspective problem in a native way, minimizing the problems of the ambiguity on the distance to the object existing with the orthographic projection approaches. The proposed system also handles hundreds of points and is ready to comply with real-time restrictions for its application on limited hardware resources systems. Additionally, it presents a good trade-off between reconstruction error and processing time regarding other proposals of the state-of-the-art.