Visual odometry

Last updated
The optical flow vector of a moving object in a video sequence Optical flow example v2.png
The optical flow vector of a moving object in a video sequence

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers. [1]

Contents

Overview

In navigation, odometry is the use of data from the movement of actuators to estimate change in position over time through devices such as rotary encoders to measure wheel rotations. While useful for many wheeled or tracked vehicles, traditional odometry techniques cannot be applied to mobile robots with non-standard locomotion methods, such as legged robots. In addition, odometry universally suffers from precision problems, since wheels tend to slip and slide on the floor creating a non-uniform distance traveled as compared to the wheel rotations. The error is compounded when the vehicle operates on non-smooth surfaces. Odometry readings become increasingly unreliable as these errors accumulate and compound over time.

Visual odometry is the process of determining equivalent odometry information using sequential camera images to estimate the distance traveled. Visual odometry allows for enhanced navigational accuracy in robots or vehicles using any type of locomotion on any[ citation needed ] surface.

Types

There are various types of VO.

Monocular and stereo

Depending on the camera setup, VO can be categorized as Monocular VO (single camera), Stereo VO (two camera in stereo setup).

VIO is widely used in commercial quadcopters, which provide localization in GPS denied situations. VIO sensor in various commercial quadcopters .jpg
VIO is widely used in commercial quadcopters, which provide localization in GPS denied situations.

Feature-based and direct method

Traditional VO's visual information is obtained by the feature-based method, which extracts the image feature points and tracks them in the image sequence. Recent developments in VO research provided an alternative, called the direct method, which uses pixel intensity in the image sequence directly as visual input. There are also hybrid methods.

Visual inertial odometry

If an inertial measurement unit (IMU) is used within the VO system, it is commonly referred to as Visual Inertial Odometry (VIO).

Algorithm

Most existing approaches to visual odometry are based on the following stages.

  1. Acquire input images: using either single cameras., [2] [3] stereo cameras, [3] [4] or omnidirectional cameras. [5] [6]
  2. Image correction: apply image processing techniques for lens distortion removal, etc.
  3. Feature detection: define interest operators, and match features across frames and construct optical flow field.
    1. Feature extraction and correlation.
    2. Construct optical flow field (Lucas–Kanade method).
  4. Check flow field vectors for potential tracking errors and remove outliers. [7]
  5. Estimation of the camera motion from the optical flow. [8] [9] [10] [11]
    1. Choice 1: Kalman filter for state estimate distribution maintenance.
    2. Choice 2: find the geometric and 3D properties of the features that minimize a cost function based on the re-projection error between two adjacent images. This can be done by mathematical minimization or random sampling.
  6. Periodic repopulation of trackpoints to maintain coverage across the image.

An alternative to feature-based methods is the "direct" or appearance-based visual odometry technique which minimizes an error directly in sensor space and subsequently avoids feature matching and extraction. [4] [12] [13]

Another method, coined 'visiodometry' estimates the planar roto-translations between images using Phase correlation instead of extracting features. [14] [15]

Egomotion

Egomotion estimation using corner detection Egomotion-odometry.gif
Egomotion estimation using corner detection

Egomotion is defined as the 3D motion of a camera within an environment. [16] In the field of computer vision, egomotion refers to estimating a camera's motion relative to a rigid scene. [17] An example of egomotion estimation would be estimating a car's moving position relative to lines on the road or street signs being observed from the car itself. The estimation of egomotion is important in autonomous robot navigation applications. [18]

Overview

The goal of estimating the egomotion of a camera is to determine the 3D motion of that camera within the environment using a sequence of images taken by the camera. [19] The process of estimating a camera's motion within an environment involves the use of visual odometry techniques on a sequence of images captured by the moving camera. [20] This is typically done using feature detection to construct an optical flow from two image frames in a sequence [16] generated from either single cameras or stereo cameras. [20] Using stereo image pairs for each frame helps reduce error and provides additional depth and scale information. [21] [22]

Features are detected in the first frame, and then matched in the second frame. This information is then used to make the optical flow field for the detected features in those two images. The optical flow field illustrates how features diverge from a single point, the focus of expansion. The focus of expansion can be detected from the optical flow field, indicating the direction of the motion of the camera, and thus providing an estimate of the camera motion.

There are other methods of extracting egomotion information from images as well, including a method that avoids feature detection and optical flow fields and directly uses the image intensities. [16]

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Simultaneous localization and mapping</span> Computational navigational technique used by robots and autonomous vehicles

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. While this initially appears to be a chicken or the egg problem, there are several algorithms known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the particle filter, extended Kalman filter, covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in computational geometry and computer vision, and are used in robot navigation, robotic mapping and odometry for virtual reality or augmented reality.

<span class="mw-page-title-main">Optical flow</span> Pattern of motion in a visual scene due to relative motion of the observer

Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. Optical flow can also be defined as the distribution of apparent velocities of movement of brightness pattern in an image.

<span class="mw-page-title-main">Motion estimation</span> Process used in video coding/compression

In computer vision and image processing, motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion happens in three dimensions (3D) but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.

The following outline is provided as an overview of and topical guide to computer vision:

Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer vision and visual perception.

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

Visual servoing, also known as vision-based robot control and abbreviated VS, is a technique which uses feedback information extracted from a vision sensor to control the motion of a robot. One of the earliest papers that talks about visual servoing was from the SRI International Labs in 1979.

<span class="mw-page-title-main">Omnidirectional (360-degree) camera</span> Camera that can see in all directions

In photography, an omnidirectional camera, also known as 360-degree camera, is a camera having a field of view that covers approximately the entire sphere or at least a full circle in the horizontal plane. Omnidirectional cameras are important in areas where large visual field coverage is needed, such as in panoramic photography and robotics.

2D to 3D video conversion is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

<span class="mw-page-title-main">Gregory D. Hager</span> American computer scientist

Gregory D. Hager is the Mandell Bellmore Professor of Computer Science and founding director of the Johns Hopkins Malone Center for Engineering in Healthcare at Johns Hopkins University.

<span class="mw-page-title-main">Inverse depth parametrization</span> Computational method for constructing 3D models

In computer vision, the inverse depth parametrization is a parametrization used in methods for 3D reconstruction from multiple images such as simultaneous localization and mapping (SLAM). Given a point in 3D space observed by a monocular pinhole camera from multiple views, the inverse depth parametrization of the point's position is a 6D vector that encodes the optical centre of the camera when in first observed the point, and the position of the point along the ray passing through and .

<span class="mw-page-title-main">Michael J. Black</span> American-born computer scientist

Michael J. Black is an American-born computer scientist working in Tübingen, Germany. He is a founding director at the Max Planck Institute for Intelligent Systems where he leads the Perceiving Systems Department in research focused on computer vision, machine learning, and computer graphics. He is also an Honorary Professor at the University of Tübingen.

<span class="mw-page-title-main">Margarita Chli</span> Greek computer vision and robotics researcher

Margarita Chli is an assistant professor and leader of the Vision for Robotics Lab at ETH Zürich in Switzerland. Chli is a leader in the field of computer vision and robotics and was on the team of researchers to develop the first fully autonomous helicopter with onboard localization and mapping. Chli is also the Vice Director of the Institute of Robotics and Intelligent Systems and an Honorary Fellow of the University of Edinburgh in the United Kingdom. Her research currently focuses on developing visual perception and intelligence in flying autonomous robotic systems.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

Video matting is a technique for separating the video into two or more layers, usually foreground and background, and generating alpha mattes which determine blending of the layers. The technique is very popular in video editing because it allows to substitute the background, or process the layers individually.

References

  1. Maimone, M.; Cheng, Y.; Matthies, L. (2007). "Two years of Visual Odometry on the Mars Exploration Rovers" (PDF). Journal of Field Robotics. 24 (3): 169–186. CiteSeerX   10.1.1.104.3110 . doi:10.1002/rob.20184. S2CID   17544166 . Retrieved 2008-07-10.
  2. Chhaniyara, Savan; KASPAR ALTHOEFER; LAKMAL D. SENEVIRATNE (2008). "Visual Odometry Technique Using Circular Marker Identification For Motion Parameter Estimation". Advances in Mobile Robotics: Proceedings of the Eleventh International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, Coimbra, Portugal. The Eleventh International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines. Vol. 11. World Scientific, 2008. Archived from the original on 2012-02-24. Retrieved 2010-01-22.
  3. 1 2 Nister, D; Naroditsky, O.; Bergen, J (Jan 2004). Visual Odometry. Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 1. pp. I–652 – I–659 Vol.1. doi:10.1109/CVPR.2004.1315094.
  4. 1 2 Comport, A.I.; Malis, E.; Rives, P. (2010). F. Chaumette; P. Corke; P. Newman (eds.). "Real-time Quadrifocal Visual Odometry". International Journal of Robotics Research. 29 (2–3): 245–266. CiteSeerX   10.1.1.720.3113 . doi:10.1177/0278364909356601. S2CID   15139693.
  5. Scaramuzza, D.; Siegwart, R. (October 2008). "Appearance-Guided Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles". IEEE Transactions on Robotics. 24 (5): 1015–1026. doi:10.1109/TRO.2008.2004490. hdl: 20.500.11850/14362 . S2CID   13894940.
  6. Corke, P.; Strelow, D.; Singh, S. "Omnidirectional visual odometry for a planetary rover". Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on. Vol. 4. doi:10.1109/IROS.2004.1390041.
  7. Campbell, J.; Sukthankar, R.; Nourbakhsh, I.; Pittsburgh, I.R. "Techniques for evaluating optical flow for visual odometry in extreme terrain". Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on. Vol. 4. doi:10.1109/IROS.2004.1389991.
  8. Sunderhauf, N.; Konolige, K.; Lacroix, S.; Protzel, P. (2005). "Visual odometry using sparse bundle adjustment on an autonomous outdoor vehicle". In Levi; Schanz; Lafrenz; Avrutin (eds.). Tagungsband Autonome Mobile Systeme 2005 (PDF). Reihe Informatik aktuell. Springer Verlag. pp. 157–163. Archived from the original (PDF) on 2009-02-11. Retrieved 2008-07-10.
  9. Konolige, K.; Agrawal, M.; Bolles, R.C.; Cowan, C.; Fischler, M.; Gerkey, B.P. (2008). "Outdoor Mapping and Navigation Using Stereo Vision". Experimental Robotics. Springer Tracts in Advanced Robotics. Vol. 39. pp. 179–190. doi:10.1007/978-3-540-77457-0_17. ISBN   978-3-540-77456-3.
  10. Olson, C.F.; Matthies, L.; Schoppers, M.; Maimone, M.W. (2002). "Rover navigation using stereo ego-motion" (PDF). Robotics and Autonomous Systems. 43 (4): 215–229. doi:10.1016/s0921-8890(03)00004-6 . Retrieved 2010-06-06.
  11. Cheng, Y.; Maimone, M.W.; Matthies, L. (2006). "Visual Odometry on the Mars Exploration Rovers". IEEE Robotics and Automation Magazine. 13 (2): 54–62. CiteSeerX   10.1.1.297.4693 . doi:10.1109/MRA.2006.1638016. S2CID   15149330.
  12. Engel, Jakob; Schöps, Thomas; Cremers, Daniel (2014). "LSD-SLAM: Large-Scale Direct Monocular SLAM" (PDF). In Fleet D.; Pajdla T.; Schiele B.; Tuytelaars T. (eds.). Computer Vision. European Conference on Computer Vision 2014. Lecture Notes in Computer Science. Vol. 8690. doi:10.1007/978-3-319-10605-2_54.
  13. Engel, Jakob; Sturm, Jürgen; Cremers, Daniel (2013). "Semi-Dense Visual Odometry for a Monocular Camera" (PDF). IEEE International Conference on Computer Vision (ICCV). CiteSeerX   10.1.1.402.6918 . doi:10.1109/ICCV.2013.183.
  14. Zaman, M. (2007). "High Precision Relative Localization Using a Single Camera". Robotics and Automation, 2007.(ICRA 2007). Proceedings. 2007 IEEE International Conference on. doi:10.1109/ROBOT.2007.364078.
  15. Zaman, M. (2007). "High resolution relative localisation using two cameras". Journal of Robotics and Autonomous Systems. 55 (9): 685–692. doi:10.1016/j.robot.2007.05.008.
  16. 1 2 3 Irani, M.; Rousso, B.; Peleg S. (June 1994). "Recovery of Ego-Motion Using Image Stabilization" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition: 21–23. Retrieved 7 June 2010.
  17. Burger, W.; Bhanu, B. (Nov 1990). "Estimating 3D egomotion from perspective image sequence". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (11): 1040–1058. doi:10.1109/34.61704. S2CID   206418830.
  18. Shakernia, O.; Vidal, R.; Shankar, S. (2003). "Omnidirectional Egomotion Estimation From Back-projection Flow" (PDF). Conference on Computer Vision and Pattern Recognition Workshop. 7: 82. CiteSeerX   10.1.1.5.8127 . doi:10.1109/CVPRW.2003.10074. S2CID   5494756 . Retrieved 7 June 2010.
  19. Tian, T.; Tomasi, C.; Heeger, D. (1996). "Comparison of Approaches to Egomotion Computation" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition: 315. Archived from the original (PDF) on August 8, 2008. Retrieved 7 June 2010.
  20. 1 2 Milella, A.; Siegwart, R. (January 2006). "Stereo-Based Ego-Motion Estimation Using Pixel Tracking and Iterative Closest Point" (PDF). IEEE International Conference on Computer Vision Systems: 21. Archived from the original (PDF) on 17 September 2010. Retrieved 7 June 2010.
  21. Olson, C. F.; Matthies, L.; Schoppers, M.; Maimoneb M. W. (June 2003). "Rover navigation using stereo ego-motion" (PDF). Robotics and Autonomous Systems. 43 (9): 215–229. doi:10.1016/s0921-8890(03)00004-6 . Retrieved 7 June 2010.
  22. Sudin Dinesh, Koteswara Rao, K.; Unnikrishnan, M.; Brinda, V.; Lalithambika, V.R.; Dhekane, M.V. "Improvements in Visual Odometry Algorithm for Planetary Exploration Rovers". IEEE International Conference on Emerging Trends in Communication, Control, Signal Processing & Computing Applications (C2SPCA), 2013