Reduced form

Last updated

In statistics, and particularly in econometrics, the reduced form of a system of equations is the result of solving the system for the endogenous variables. This gives the latter as functions of the exogenous variables, if any. In econometrics, the equations of a structural form model are estimated in their theoretically given form, while an alternative approach to estimation is to first solve the theoretical equations for the endogenous variables to obtain reduced form equations, and then to estimate the reduced form equations.

Contents

Let Y be the vector of the variables to be explained (endogeneous variables) by a statistical model and X be the vector of explanatory (exogeneous) variables. In addition let be a vector of error terms. Then the general expression of a structural form is , where f is a function, possibly from vectors to vectors in the case of a multiple-equation model. The reduced form of this model is given by , with g a function.

Structural and reduced forms

Exogenous variables are variables which are not determined by the system. If we assume that demand is influenced not only by price, but also by an exogenous variable, Z, we can consider the structural supply and demand model

supply:  
demand: 

where the terms are random errors (deviations of the quantities supplied and demanded from those implied by the rest of each equation). By solving for the unknowns (endogenous variables) P and Q, this structural model can be rewritten in the reduced form:

where the parameters depend on the parameters of the structural model, and where the reduced form errors each depend on the structural parameters and on both structural errors. Note that both endogenous variables depend on the exogenous variable Z.

If the reduced form model is estimated using empirical data, obtaining estimated values for the coefficients some of the structural parameters can be recovered: By combining the two reduced form equations to eliminate Z, the structural coefficients of the supply side model ( and ) can be derived:

Note however, that this still does not allow us to identify the structural parameters of the demand equation. For that, we would need an exogenous variable which is included in the supply equation of the structural model, but not in the demand equation.

The general linear case

Let y be a column vector of M endogenous variables. In the case above with Q and P, we had M = 2. Let z be a column vector of K exogenous variables; in the case above z consisted only of Z. The structural linear model is

where is a vector of structural shocks, and A and B are matrices; A is a square M × M matrix, while B is M × K. The reduced form of the system is:

with vector of reduced form errors that each depends on all structural errors, where the matrix A must be nonsingular for the reduced form to exist and be unique. Again, each endogenous variable depends on potentially each exogenous variable.

Without restrictions on the A and B, the coefficients of A and B cannot be identified from data on y and z: each row of the structural model is just a linear relation between y and z with unknown coefficients. (This is again the parameter identification problem.) The M reduced form equations (the rows of the matrix equation y = Π z above) can be identified from the data because each of them contains only one endogenous variable.

See also

Further reading

Related Research Articles

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

Poissons equation Expression frequently encountered in mathematical physics, generalization of Laplaces equation.

Poisson's equation is an elliptic partial differential equation of broad utility in theoretical physics. For example, the solution to Poisson's equation is the potential field caused by a given electric charge or mass density distribution; with the potential field known, one can then calculate electrostatic or gravitational (force) field. It is a generalization of Laplace's equation, which is also frequently seen in physics. The equation is named after French mathematician and physicist Siméon Denis Poisson.

In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.

Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined with the dependent variable, which in economics usually is the consequence of some underlying equilibrium mechanism. Take the typical supply and demand model: whilst typically one would determine the quantity supplied and demanded to be a function of the price set by the market, it is also possible for the reverse to be true, where producers observe the quantity that consumers demand and then set the price.

In the statistical analysis of time series, autoregressive–moving-average (ARMA) models provide a parsimonious description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the second for the moving average (MA). The general ARMA model was described in the 1951 thesis of Peter Whittle, Hypothesis testing in time series analysis, and it was popularized in the 1970 book by George E. P. Box and Gwilym Jenkins.

In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

In econometrics, endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term. The distinction between endogenous and exogenous variables originated in simultaneous equations models, where one separates variables whose values are determined by the model from variables which are predetermined; ignoring simultaneity in the estimation leads to biased estimates as it violates the exogeneity assumption of the Gauss–Markov theorem. The problem of endogeneity is often, unfortunately, ignored by researchers conducting non-experimental research and doing so precludes making policy recommendations. Instrumental variable techniques are commonly used to address this problem.

Debye–Hückel equation Equation to calculate activity coefficients of ions in aqueous solution as a function of ionic strength

The chemists Peter Debye and Erich Hückel noticed that solutions that contain ionic solutes do not behave ideally even at very low concentrations. So, while the concentration of the solutes is fundamental to the calculation of the dynamics of a solution, they theorized that an extra factor that they termed gamma is necessary to the calculation of the activity coefficients of the solution. Hence they developed the Debye–Hückel equation and Debye–Hückel limiting law. The activity is only proportional to the concentration and is altered by a factor known as the activity coefficient . This factor takes into account the interaction energy of ions in solution.

Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics to analyze two-dimensional panel data. The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions. Multidimensional analysis is an econometric method in which data are collected over more than two dimensions.

Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregressive model by allowing for multivariate time series. VAR models are often used in economics and the natural sciences.

In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations.

In economics and econometrics, the parameter identification problem arises when the value of one or more parameters in an economic model cannot be determined from observable variables. It is closely related to non-identifiability in statistics and econometrics, which occurs when a statistical model has more than one set of parameters that generate the same distribution of observations, meaning that multiple parameterizations are observationally equivalent.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

Errors-in-variables models Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.