Hessian affine region detector

Last updated

The Hessian affine region detector is a feature detector used in the fields of computer vision and image analysis. Like other feature detectors, the Hessian affine detector is typically used as a preprocessing step to algorithms that rely on identifiable, characteristic interest points.

Contents

The Hessian affine detector is part of the subclass of feature detectors known as affine-invariant detectors: Harris affine region detector, Hessian affine regions, maximally stable extremal regions, Kadir–Brady saliency detector, edge-based regions (EBR) and intensity-extrema-based (IBR) regions.

Algorithm description

The Hessian affine detector algorithm is almost identical to the Harris affine region detector. In fact, both algorithms were derived by Krystian Mikolajczyk and Cordelia Schmid in 2002, [1] based on earlier work in, [2] [3] see also [4] for a more general overview.

How does the Hessian affine differ?

The Harris affine detector relies on interest points detected at multiple scales using the Harris corner measure on the second-moment matrix. The Hessian affine also uses a multiple scale iterative algorithm to spatially localize and select scale and affine invariant points. However, at each individual scale, the Hessian affine detector chooses interest points based on the Hessian matrix at that point:

where is second partial derivative in the direction and is the mixed partial second derivative in the and directions. It's important to note that the derivatives are computed in the current iteration scale and thus are derivatives of an image smoothed by a Gaussian kernel: . As discussed in the Harris affine region detector article, the derivatives must be scaled appropriately by a factor related to the Gaussian kernel: .

At each scale, interest points are those points that simultaneously are local extrema of both the determinant and trace of the Hessian matrix. The trace of Hessian matrix is identical to the Laplacian of Gaussians (LoG): [5]

As discussed in Mikolajczyk et al.(2005), by choosing points that maximize the determinant of the Hessian, this measure penalizes longer structures that have small second derivatives (signal changes) in a single direction. [6] This type of measure is very similar to the measures used in the blob detection schemes proposed by Lindeberg (1998), where either the Laplacian or the determinant of the Hessian were used in blob detection methods with automatic scale selection.

Like the Harris affine algorithm, these interest points based on the Hessian matrix are also spatially localized using an iterative search based on the Laplacian of Gaussians. Predictably, these interest points are called Hessian–Laplace interest points. Furthermore, using these initially detected points, the Hessian affine detector uses an iterative shape adaptation algorithm to compute the local affine transformation for each interest point. The implementation of this algorithm is almost identical to that of the Harris affine detector; however, the above mentioned Hessian measure replaces all instances of the Harris corner measure.

Robustness to affine and other transformations

Mikolajczyk et al. (2005) have done a thorough analysis of several state of the art affine region detectors: Harris affine, Hessian affine, MSER, [7] IBR & EBR [8] and salient [9] detectors. [6] Mikolajczyk et al. analyzed both structured images and textured images in their evaluation. Linux binaries of the detectors and their test images are freely available at their webpage. A brief summary of the results of Mikolajczyk et al. (2005) follow; see A comparison of affine region detectors for a more quantitative analysis.

Overall, the Hessian affine detector performs second best to MSER. Like the Harris affine detector, Hessian affine interest regions tend to be more numerous and smaller than other detectors. For a single image, the Hessian affine detector typically identifies more reliable regions than the Harris-Affine detector. The performance changes depending on the type of scene being analyzed. The Hessian affine detector responds well to textured scenes in which there are a lot of corner-like parts. However, for some structured scenes, like buildings, the Hessian affine detector performs very well. This is complementary to MSER that tends to do better with well structured (segmentable) scenes.

Software packages

See also

Related Research Articles

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants".

<span class="mw-page-title-main">Canny edge detector</span>

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

<span class="mw-page-title-main">Ricker wavelet</span> Wavelet proportional to the second derivative of a Gaussian

In mathematics and numerical analysis, the Ricker wavelet

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .

<span class="mw-page-title-main">Gaussian blur</span> Type of image blur produced by a Gaussian function

In image processing, a Gaussian blur is the result of blurring an image by a Gaussian function.

In imaging science, difference of Gaussians (DoG) is a feature enhancement algorithm that involves the subtraction of one Gaussian blurred version of an original image from another, less blurred version of the original. In the simple case of grayscale images, the blurred images are obtained by convolving the original grayscale images with Gaussian kernels having differing width. Blurring an image using a Gaussian kernel suppresses only high-frequency spatial information. Subtracting one image from the other preserves spatial information that lies between the range of frequencies that are preserved in the two blurred images. Thus, the DoG is a spatial band-pass filter that attenuates frequencies in the original grayscale image that are far from the band center.

<span class="mw-page-title-main">Corner detection</span>

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is convolution.

Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape adaptation can be accomplished by iteratively warping a local image patch with affine transformations while applying a rotationally symmetric filter to the warped image patches. Provided that this iterative process converges, the resulting fixed point will be affine invariant. In the area of computer vision, this idea has been used for defining affine invariant interest point operators as well as affine invariant texture analysis methods.

In mathematics, the structure tensor, also referred to as the second-moment matrix, is a matrix derived from the gradient of a function. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant respect the observing coordinates. The structure tensor is often used in image processing and computer vision.

In computer vision, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as object recognition, image registration, classification, or 3D reconstruction. It is partly inspired by the scale-invariant feature transform (SIFT) descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT.

The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 2004 and a robust version was designed by Shao et al. in 2007.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

The principal curvature-based region detector, also called PCBR is a feature detector used in the fields of computer vision and image analysis. Specifically the PCBR detector is designed for object recognition applications.

In the fields of computer vision and image analysis, the scale-invariant feature operator is an algorithm to detect local features in images. The algorithm was published by Förstner et al. in 2009.

References

  1. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada.
  2. Lindeberg, Tony. "Feature detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp. 77-116, 1998.
  3. T. Lindeberg and J. Garding (1997). "Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure". Image and Vision Computing. 15 (6): 415–434. doi:10.1016/S0262-8856(97)01144-X.
  4. T. Lindeberg (2008–2009). "Scale-space". Encyclopedia of Computer Science and Engineering (Benjamin Wah, ed), John Wiley and Sons. Vol. IV. pp. 2495–2504. doi:10.1002/9780470050118.ecse609.
  5. Mikolajczyk K. and Schmid, C. 2004. Scale & affine invariant interest point detectors. International Journal on Computer Vision 60(1):63-86.
  6. 1 2 K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, A comparison of affine region detectors. In IJCV 65(1/2):43-72, 2005
  7. J.Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions. In BMVC p. 384-393, 2002.
  8. T.Tuytelaars and L. Van Gool, Matching widely separated views based on affine invariant regions . In IJCV 59(1):61-85, 2004.
  9. T. Kadir, A. Zisserman, and M. Brady, An affine invariant salient region detector. In ECCV p. 404-416, 2004.