Parameter space

Last updated

The parameter space is the space of possible parameter values that define a particular mathematical model. It is also sometimes called weight space, and is often a subset of finite-dimensional Euclidean space.

Contents

In statistics, parameter spaces are particularly useful for describing parametric families of probability distributions. They also form the background for parameter estimation. In the case of extremum estimators for parametric models, a certain objective function is maximized or minimized over the parameter space. [1] Theorems of existence and consistency of such estimators require some assumptions about the topology of the parameter space. For instance, compactness of the parameter space, together with continuity of the objective function, suffices for the existence of an extremum estimator. [1]

In Deep Learning, the parameters of a deep network are called weights. Due to the layered structure of deep networks, their weight space has a complex structure and geometry. [2] [3] For example, in Multilayer Perceptrons, the same function is preserved when permuting the nodes of a hidden layer, amounting to permuting weight matrices of the network. This property is known as equivariance to permutation of deep weight spaces. [2]

Sometimes, parameters are analyzed to view how they affect their statistical model. In that context, they can be viewed as inputs of a function, in which case the technical term for the parameter space is domain of a function. The ranges of values of the parameters may form the axes of a plot, and particular outcomes of the model may be plotted against these axes to illustrate how different regions of the parameter space produce different types of behavior in the model.

Examples

For some values of r, this function ends up cycling around a few values or becomes fixed on one value. These long-term values can be plotted against r in a bifurcation diagram to show the different behaviours of the function for different values of r.
The famous Mandelbrot set is a subset of this parameter space, consisting of the points in the complex plane which give a bounded set of numbers when a particular iterated function is repeatedly applied from that starting point. The remaining points, which are not in the set, give an unbounded set of numbers (they tend to infinity) when this function is repeatedly applied from that starting point.

History

Parameter space contributed to the liberation of geometry from the confines of three-dimensional space. For instance, the parameter space of spheres in three dimensions, has four dimensions—three for the sphere center and another for the radius. According to Dirk Struik, it was the book Neue Geometrie des Raumes (1849) by Julius Plücker that showed

...geometry need not solely be based on points as basic elements. Lines, planes, circles, spheres can all be used as the elements (Raumelemente) on which a geometry can be based. This fertile conception threw new light on both synthetic and algebraic geometry and created new forms of duality. The number of dimensions of a particular form of geometry could now be any positive number, depending on the number of parameters necessary to define the "element". [5] :165

The requirement for higher dimensions is illustrated by Plücker's line geometry. Struik writes

[Plücker's] geometry of lines in three-space could be considered as a four-dimensional geometry, or, as Klein has stressed, as the geometry of a four-dimensional quadric in a five-dimensional space. [5] :168

Thus the Klein quadric describes the parameters of lines in space.

See also

Related Research Articles

In mathematics, analytic geometry, also known as coordinate geometry or Cartesian geometry, is the study of geometry using a coordinate system. This contrasts with synthetic geometry.

In mathematics, an equation is a mathematical formula that expresses the equality of two expressions, by connecting them with the equals sign =. The word equation and its cognates in other languages may have subtly different meanings; for example, in French an équation is defined as containing one or more variables, while in English, any well-formed formula consisting of two expressions related with an equals sign is an equation.

<span class="mw-page-title-main">Estimator</span> Rule for calculating an estimate of a given quantity based on observed data

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

A parameter, generally, is any characteristic that can help in defining or classifying a particular system. That is, a parameter is an element of a system that is useful, or critical, when identifying the system, or when evaluating its performance, status, condition, etc.

In mathematics, a quadric or quadric surface (quadric hypersurface in higher dimensions), is a generalization of conic sections (ellipses, parabolas, and hyperbolas). It is a hypersurface (of dimension D) in a (D + 1)-dimensional space, and it is defined as the zero set of an irreducible polynomial of degree two in D + 1 variables; for example, D = 1 in the case of conic sections. When the defining polynomial is not absolutely irreducible, the zero set is generally not considered a quadric, although it is often called a degenerate quadric or a reducible quadric.

In statistics, completeness is a property of a statistic in relation to a parameterised model for a set of observed data.

In mathematics, conformal geometry is the study of the set of angle-preserving (conformal) transformations on a space.

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Parametric equation</span> Representation of a curve by a function of a parameter

In mathematics, a parametric equation defines a group of quantities as functions of one or more independent variables called parameters. Parametric equations are commonly used to express the coordinates of the points that make up a geometric object such as a curve or surface, called a parametric curve and parametric surface, respectively. In such cases, the equations are collectively called a parametric representation, or parametric system, or parameterization of the object.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

Robust statistics are statistics which maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">Three-dimensional space</span> Geometric model of the physical space

In geometry, a three-dimensional space is a mathematical space in which three values (coordinates) are required to determine the position of a point. Most commonly, it is the three-dimensional Euclidean space, that is, the Euclidean space of dimension three, which models physical space. More general three-dimensional spaces are called 3-manifolds. The term may also refer colloquially to a subset of space, a three-dimensional region, a solid figure.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

<span class="mw-page-title-main">Surface (mathematics)</span> Mathematical idealization of the surface of a body

In mathematics, a surface is a mathematical model of the common concept of a surface. It is a generalization of a plane, but, unlike a plane, it may be curved; this is analogous to a curve generalizing a straight line.

<span class="mw-page-title-main">Lie sphere geometry</span> Geometry founded on spheres

Lie sphere geometry is a geometrical theory of planar or spatial geometry in which the fundamental concept is the circle or sphere. It was introduced by Sophus Lie in the nineteenth century. The main idea which leads to Lie sphere geometry is that lines should be regarded as circles of infinite radius and that points in the plane should be regarded as circles of zero radius.

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

<span class="mw-page-title-main">Intersection (geometry)</span> Shape formed from points common to other shapes

In geometry, an intersection is a point, line, or curve common to two or more objects. The simplest case in Euclidean geometry is the line–line intersection between two distinct lines, which either is one point or does not exist. Other types of geometric intersection include:

<span class="mw-page-title-main">Intersection curve</span> Curve that is common to two geometric objects

In geometry, an intersection curve is a curve that is common to two geometric objects. In the simplest case, the intersection of two non-parallel planes in Euclidean 3-space is a line. In general, an intersection curve consists of the common points of two transversally intersecting surfaces, meaning that at any common point the surface normals are not parallel. This restriction excludes cases where the surfaces are touching or have surface parts in common.

References

  1. 1 2 Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 446. ISBN   0-691-01018-8.
  2. 1 2 Navon, Aviv; Shamsian, Aviv; Achituve, Idan; Fetaya, Ethan; Chechik, Gal; Maron, Haggai (2023-07-03). "Equivariant Architectures for Learning in Deep Weight Spaces". Proceedings of the 40th International Conference on Machine Learning. PMLR: 25790–25816.
  3. Hecht-Nielsen, Robert (1990-01-01), Eckmiller, Rolf (ed.), "ON THE ALGEBRAIC STRUCTURE OF FEEDFORWARD NETWORK WEIGHT SPACES", Advanced Neural Computers, Amsterdam: North-Holland, pp. 129–135, ISBN   978-0-444-88400-8 , retrieved 2023-12-01
  4. Gasperino, J.; Rom, W. N. (2004). "Gender and lung cancer". Clinical Lung Cancer. 5 (6): 353–359. doi:10.3816/CLC.2004.n.013. PMID   15217534.
  5. 1 2 Dirk Struik (1967) A Concise History of Mathematics, 3rd edition, Dover Books