Total variation denoising

Last updated May 18, 2024

In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process (filter). It is based on the principle that signals with excessive and possibly spurious detail have high total variation , that is, the integral of the image gradient magnitude is high. According to this principle, reducing the total variation of the signal—subject to it being a close match to the original signal—removes unwanted detail whilst preserving important details such as edges. The concept was pioneered by L. I. Rudin, S. Osher, and E. Fatemi in 1992 and so is today known as the ROF model.^[1]

This noise removal technique has advantages over simple techniques such as linear smoothing or median filtering which reduce noise but at the same time smooth away edges to a greater or lesser degree. By contrast, total variation denoising is a remarkably effective edge-preserving filter, i.e., simultaneously preserving edges whilst smoothing away noise in flat regions, even at low signal-to-noise ratios.^[2]

1D signal series

For a digital signal $x_{n}$ , we can, for example, define the total variation as

V(x)=\sum _{n}|x_{n+1}-x_{n}|.

Given an input signal $x_{n}$ , the goal of total variation denoising is to find an approximation, call it $y_{n}$ , that has smaller total variation than $x_{n}$ but is "close" to $x_{n}$ . One measure of closeness is the sum of square errors:

\operatorname {E} (x,y)={\frac {1}{n}}\sum _{n}(x_{n}-y_{n})^{2}.

So the total-variation denoising problem amounts to minimizing the following discrete functional over the signal $y_{n}$ :

\operatorname {E} (x,y)+\lambda V(y).

By differentiating this functional with respect to $y_{n}$ , we can derive a corresponding Euler–Lagrange equation, that can be numerically integrated with the original signal $x_{n}$ as initial condition. This was the original approach.^[1] Alternatively, since this is a convex functional, techniques from convex optimization can be used to minimize it and find the solution $y_{n}$ .^[3]

Regularization properties

The regularization parameter $\lambda$ plays a critical role in the denoising process. When $\lambda =0$ , there is no smoothing and the result is the same as minimizing the sum of squares. As $\lambda \to \infty$ , however, the total variation term plays an increasingly strong role, which forces the result to have smaller total variation, at the expense of being less like the input (noisy) signal. Thus, the choice of regularization parameter is critical to achieving just the right amount of noise removal.

2D signal images

We now consider 2D signals y, such as images. The total-variation norm proposed by the 1992 article is

V(y)=\sum _{i,j}{\sqrt {|y_{i+1,j}-y_{i,j}|^{2}+|y_{i,j+1}-y_{i,j}|^{2}}}

and is isotropic and not differentiable. A variation that is sometimes used, since it may sometimes be easier to minimize, is an anisotropic version

V_{\operatorname {aniso} }(y)=\sum _{i,j}{\sqrt {|y_{i+1,j}-y_{i,j}|^{2}}}+{\sqrt {|y_{i,j+1}-y_{i,j}|^{2}}}=\sum _{i,j}|y_{i+1,j}-y_{i,j}|+|y_{i,j+1}-y_{i,j}|.

The standard total-variation denoising problem is still of the form

\min _{y}[\operatorname {E} (x,y)+\lambda V(y)],

where E is the 2D L₂ norm. In contrast to the 1D case, solving this denoising is non-trivial. A recent algorithm that solves this is known as the primal dual method.^[4]

Due in part to much research in compressed sensing in the mid-2000s, there are many algorithms, such as the split-Bregman method, that solve variants of this problem.

Rudin–Osher–Fatemi PDE

Suppose that we are given a noisy image $f$ and wish to compute a denoised image $u$ over a 2D space. ROF showed that the minimization problem we are looking to solve is:^[5]

\min _{u\in \operatorname {BV} (\Omega )}\;\|u\|_{\operatorname {TV} (\Omega )}+{\lambda  \over 2}\int _{\Omega }(f-u)^{2}\,dx

where ${\textstyle \operatorname {BV} (\Omega )}$ is the set of functions with bounded variation over the domain $\Omega$ , ${\textstyle \operatorname {TV} (\Omega )}$ is the total variation over the domain, and ${\textstyle \lambda }$ is a penalty term. When ${\textstyle u}$ is smooth, the total variation is equivalent to the integral of the gradient magnitude:

\|u\|_{\operatorname {TV} (\Omega )}=\int _{\Omega }\|\nabla u\|\,dx

where ${\textstyle \|\cdot \|}$ is the Euclidean norm. Then the objective function of the minimization problem becomes:

\min _{u\in \operatorname {BV} (\Omega )}\;\int _{\Omega }\left[\|\nabla u\|+{\lambda  \over 2}(f-u)^{2}\right]\,dx

From this functional, the Euler-Lagrange equation for minimization – assuming no time-dependence – gives us the nonlinear elliptic partial differential equation:

${\begin{cases}\nabla \cdot \left({\nabla u \over {\|\nabla u\|}}\right)+\lambda (f-u)=0,\quad &u\in \Omega \\{\partial u \over {\partial n}}=0,\quad &u\in \partial \Omega \end{cases}}$

For some numerical algorithms, it is preferable to instead solve the time-dependent version of the ROF equation:

{\partial u \over {\partial t}}=\nabla \cdot \left({\nabla u \over {\|\nabla u\|}}\right)+\lambda (f-u)

Applications

The Rudin–Osher–Fatemi model was a pivotal component in producing the first image of a black hole.^[6]

Related Research Articles

In mathematics, and more specifically in linear algebra, a linear map is a mapping $between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.$

<span class="mw-page-title-main">Navier–Stokes equations</span> Equations describing the motion of viscous fluid substances

The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

In mathematical analysis, a function of bounded variation, also known as $BV$ function, is a real-valued function whose total variation is bounded (finite): the graph of a function having this property is well behaved in a precise sense. For a continuous function of a single variable, being of bounded variation means that the distance along the direction of the $y$ -axis, neglecting the contribution of motion along $x$ -axis, traveled by a point moving along the graph has a finite value. For a continuous function of several variables, the meaning of the definition is the same, except for the fact that the continuous path to be considered cannot be the whole graph of the given function, but can be every intersection of the graph itself with a hyperplane parallel to a fixed $x$ -axis and to the $y$ -axis.

A Newtonian fluid is a fluid in which the viscous stresses arising from its flow are at every point linearly correlated to the local strain rate — the rate of change of its deformation over time. Stresses are proportional to the rate of change of the fluid's velocity vector.

In functional analysis, a branch of mathematics, a compact operator is a linear operator $, where are normed vector spaces, with the property that maps bounded subsets of to relatively compact subsets of . Such an operator is necessarily a bounded operator, and so continuous. Some authors require that are Banach, but the definition can be extended to more general spaces.$

In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation x ↦ f(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.

<span class="mw-page-title-main">Geometry processing</span>

Geometry processing is an area of research that uses concepts from applied mathematics, computer science and engineering to design efficient algorithms for the acquisition, reconstruction, analysis, manipulation, simulation and transmission of complex 3D models. As the name implies, many of the concepts, data structures, and algorithms are directly analogous to signal processing and image processing. For example, where image smoothing might convolve an intensity signal with a blur kernel formed using the Laplace operator, geometric smoothing might be achieved by convolving a surface geometry with a blur kernel formed using the Laplace-Beltrami operator.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

The topological derivative is, conceptually, a derivative of a shape functional with respect to infinitesimal changes in its topology, such as adding an infinitesimal hole or crack. When used in higher dimensions than one, the term topological gradient is also used to name the first-order term of the topological asymptotic expansion, dealing only with infinitesimal singular domain perturbations. It has applications in shape optimization, topology optimization, image processing and mechanical modeling.

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals. Compressed sensing has applications in, for example, MRI where the incoherence condition is typically satisfied.

Estimation theory, or estimation of signal parameters via rotational invariant techniques (ESPRIT), is a technique to determine the parameters of a mixture of sinusoids in background noise. This technique was first proposed for frequency estimation. However, with the introduction of phased-array systems in everyday technology, it is also used for angle of arrival estimations.

The Bregman method is an iterative algorithm to solve certain convex optimization problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967.

In image processing and computer vision, anisotropic diffusion, also called Perona–Malik diffusion, is a technique aiming at reducing image noise without removing significant parts of the image content, typically edges, lines or other details that are important for the interpretation of the image. Anisotropic diffusion resembles the process that creates a scale space, where an image generates a parameterized family of successively more and more blurred images based on a diffusion process. Each of the resulting images in this family are given as a convolution between the image and a 2D isotropic Gaussian filter, where the width of the filter increases with the parameter. This diffusion process is a linear and space-invariant transformation of the original image. Anisotropic diffusion is a generalization of this diffusion process: it produces a family of parameterized images, but each resulting image is a combination between the original image and a filter that depends on the local content of the original image. As a consequence, anisotropic diffusion is a non-linear and space-variant transformation of the original image.

In statistics and signal processing, step detection is the process of finding abrupt changes in the mean level of a time series or signal. It is usually considered as a special case of the statistical method known as change detection or change point detection. Often, the step is small and the time series is corrupted by some kind of noise, and this makes the problem challenging because the step may be hidden by the noise. Therefore, statistical and/or signal processing algorithms are often required.

The in-crowd algorithm is a numerical method for solving basis pursuit denoising quickly; faster than any other algorithm for large, sparse problems. This algorithm is an active set method, which minimizes iteratively sub-problems of the global basis pursuit denoising:

In mathematics, the Morrey–Campanato spaces $are Banach spaces which extend the notion of functions of bounded mean oscillation, describing situations where the oscillation of the function in a ball is proportional to some power of the radius other than the dimension. They are used in the theory of elliptic partial differential equations, since for certain values of, elements of the space are Hölder continuous functions over the domain .$

In mathematical optimization, the proximal operator is an operator associated with a proper, lower semi-continuous convex function $from a Hilbert space to, and is defined by:$

<span class="mw-page-title-main">Gradient discretisation method</span>

In numerical mathematics, the gradient discretisation method (GDM) is a framework which contains classical and recent numerical schemes for diffusion problems of various kinds: linear or non-linear, steady-state or time-dependent. The schemes may be conforming or non-conforming, and may rely on very general polygonal or polyhedral meshes.

In mathematics, the Chambolle-Pock algorithm is an algorithm used to solve convex optimization problems. It was introduced by Antonin Chambolle and Thomas Pock in 2011 and has since become a widely used method in various fields, including image processing, computer vision, and signal processing.

References

1 2 3 Rudin, L. I.; Osher, S.; Fatemi, E. (1992). "Nonlinear total variation based noise removal algorithms". Physica D. 60 (1–4): 259–268. Bibcode:1992PhyD...60..259R. CiteSeerX 10.1.1.117.1675 . doi:10.1016/0167-2789(92)90242-f.
↑ Strong, D.; Chan, T. (2003). "Edge-preserving and scale-dependent properties of total variation regularization". Inverse Problems. 19 (6): S165–S187. Bibcode:2003InvPr..19S.165S. doi:10.1088/0266-5611/19/6/059. S2CID 250761777.
1 2 Little, M. A.; Jones, Nick S. (2010). "Sparse Bayesian Step-Filtering for High-Throughput Analysis of Molecular Machine Dynamics" (PDF). ICASSP 2010 Proceedings. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
↑ Chambolle, A. (2004). "An algorithm for total variation minimization and applications". Journal of Mathematical Imaging and Vision. 20: 89–97. CiteSeerX 10.1.1.160.5226 . doi:10.1023/B:JMIV.0000011325.36760.1e. S2CID 207622122.
↑ Getreuer, Pascal (2012). "Rudin–Osher–Fatemi Total Variation Denoising using Split Bregman" (PDF).
↑ "Rudin–Osher–Fatemi Model Captures Infinity and Beyond". IPAM. 2019-04-15. Retrieved 2019-08-04.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rof92-1] 1 2 3 Rudin, L. I.; Osher, S.; Fatemi, E. (1992). "Nonlinear total variation based noise removal algorithms". Physica D. 60 (1–4): 259–268. Bibcode:1992PhyD...60..259R. CiteSeerX 10.1.1.117.1675 . doi:10.1016/0167-2789(92)90242-f.

[strong-2] Strong, D.; Chan, T. (2003). "Edge-preserving and scale-dependent properties of total variation regularization". Inverse Problems. 19 (6): S165–S187. Bibcode:2003InvPr..19S.165S. doi:10.1088/0266-5611/19/6/059. S2CID 250761777.

[little10-3] 1 2 Little, M. A.; Jones, Nick S. (2010). "Sparse Bayesian Step-Filtering for High-Throughput Analysis of Molecular Machine Dynamics" (PDF). ICASSP 2010 Proceedings. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[chambolle04-4] Chambolle, A. (2004). "An algorithm for total variation minimization and applications". Journal of Mathematical Imaging and Vision. 20: 89–97. CiteSeerX 10.1.1.160.5226 . doi:10.1023/B:JMIV.0000011325.36760.1e. S2CID 207622122.

[5] Getreuer, Pascal (2012). "Rudin–Osher–Fatemi Total Variation Denoising using Split Bregman" (PDF).

[6] "Rudin–Osher–Fatemi Model Captures Infinity and Beyond". IPAM. 2019-04-15. Retrieved 2019-08-04.

[1]

[2]

[3]

[4]

[5]

[6]