General regression neural network

Last updated May 19, 2023

Generalized regression neural network (GRNN) is a variation to radial basis neural networks. GRNN was suggested by D.F. Specht in 1991.^[1]

Mathematical representation

Y(x)={\frac {\sum _{k=1}^{N}y_{k}K(x,x_{k})}{\sum _{k=1}^{N}K(x,x_{k})}}

where:

$Y(x)$ is the prediction value of input $x$
$y_{k}$ is the activation weight for the pattern layer neuron at $k$
$K(x,x_{k})$ is the Radial basis function kernel (Gaussian kernel) as formulated below.

Gaussian Kernel

$K(x,x_{k})=e^{-d_{k}/2\sigma ^{2}},\qquad d_{k}=(x-x_{k})^{T}(x-x_{k})$

where $d_{k}$ is the squared euclidean distance between the training samples $x_{k}$ and the input $x.$

Implementation

GRNN has been implemented in many computer languages including MATLAB,^[3] R- programming language, Python (programming language) and Node.js.

Neural networks (specifically Multi-layer Perceptron) can delineate non-linear patterns in data by combining with generalized linear models by considering distribution of outcomes (sightly different from original GRNN). There have been several successful developments, including Poisson regression, ordinal logistic regression, quantile regression and multinomial logistic regression that described by Fallah in 2009. ^[4]

Advantages and disadvantages

Similar to RBFNN, GRNN has the following advantages:

Single-pass learning so no backpropagation is required.
High accuracy in the estimation since it uses Gaussian functions.
It can handle noises in the inputs.
It requires relatively few data to train.

The main disadvantages of GRNN are:

Its size can be huge, which would make it computationally expensive.
There is no optimal method to improve it.

Related Research Articles

Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labeled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning algorithms is learning a function that maps feature vectors (inputs) to labels (output), based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. These activities can be viewed as two facets of the same field of application, and they have undergone substantial development over the past few decades.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean any feedforward ANN, sometimes strictly to refer to networks composed of multiple layers of perceptrons ; see § Terminology. Multilayer perceptrons are sometimes colloquially referred to as "vanilla" neural networks, especially when they have a single hidden layer.

A radial basis function (RBF) is a real-valued function $whose value depends only on the distance between the input and some fixed point, either the origin, so that, or some other fixed point, called a center, so that . Any function that satisfies the property is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection which forms a basis for some function space of interest, hence the name.$

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). Kernel methods are types of algorithms that are used for pattern analysis. These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

The spike-triggered averaging (STA) is a tool for characterizing the response properties of a neuron using the spikes emitted in response to a time-varying stimulus. The STA provides an estimate of a neuron's linear receptive field. It is a useful technique for the analysis of electrophysiological data.

<span class="mw-page-title-main">Echo state network</span> Type of reservoir computer

An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0), depending on input. This is similar to the linear perceptron in neural networks. However, only nonlinear activation functions allow such networks to compute nontrivial problems using only a small number of nodes, and such activation functions are called nonlinearities.

In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

<span class="mw-page-title-main">Linear-nonlinear-Poisson cascade model</span>

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

There are many types of artificial neural networks (ANN).

A probabilistic neural network (PNN) is a feedforward neural network, which is widely used in classification and pattern recognition problems. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayes’ rule is then employed to allocate the class with highest posterior probability to new input data. By this method, the probability of mis-classification is minimized. This type of artificial neural network (ANN) was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It was introduced by D.F. Specht in 1966. In a PNN, the operations are organized into a multilayered feedforward network with four layers:

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

Bayesian networks are a modeling tool for assigning probabilities to events, and thereby characterizing the uncertainty in a model's predictions. Deep learning and artificial neural networks are approaches used in machine learning to build computational models which learn from training examples. Bayesian neural networks merge these fields. They are a type of artificial neural network whose parameters and predictions are both probabilistic. While standard artificial neural networks often assign high confidence even to incorrect predictions, Bayesian neural networks can more accurately evaluate how likely their predictions are to be correct.

References

↑ Specht, D. F. (1991-11-01). "A general regression neural network". IEEE Transactions on Neural Networks. 2 (6): 568–576. doi:10.1109/72.97934. PMID 18282872. S2CID 6266210.
↑ https://minds.wisconsin.edu/bitstream/handle/1793/7779/ch2.pdf?sequence=14 ^{[ bare URL PDF ]}
↑ "Generalized Regression Neural Networks - MATLAB & Simulink - MathWorks Australia".
↑ Fallah, Nader; Gu, Hong; Mohammad, Kazem; Seyyedsalehi, Seyyed Ali; Nourijelyani, Keramat; Eshraghian, Mohammad Reza (2009). "Nonlinear Poisson regression using neural networks: A simulation study". Neural Computing and Applications. 18 (8): 939–943. doi:10.1007/s00521-009-0277-8. S2CID 18980875.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Specht, D. F. (1991-11-01). "A general regression neural network". IEEE Transactions on Neural Networks. 2 (6): 568–576. doi:10.1109/72.97934. PMID 18282872. S2CID 6266210.

[2] ttps://minds.wisconsin.edu/bitstream/handle/1793/7779/ch2.pdf?sequence=14 ^{[ bare URL PDF ]}

[3] "Generalized Regression Neural Networks - MATLAB & Simulink - MathWorks Australia".

[4] Fallah, Nader; Gu, Hong; Mohammad, Kazem; Seyyedsalehi, Seyyed Ali; Nourijelyani, Keramat; Eshraghian, Mohammad Reza (2009). "Nonlinear Poisson regression using neural networks: A simulation study". Neural Computing and Applications. 18 (8): 939–943. doi:10.1007/s00521-009-0277-8. S2CID 18980875.

[1]

[2]

[3]

[4]