Linear least squares
Linear least squares is the least squares approximation of linear functions to data.
It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary, weighted, and generalized residuals.
Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.
Basic formulation
Consider the linear equationwhere Real number#Vocabulary and notation| and are given and is variable to be computed. When it is generally the case that has no solution.
For example, there is no value of that satisfies
because the first two rows require that but then the third row is not satisfied.
Thus, for the goal of solving exactly is typically replaced by finding the value of that minimizes some error.
There are many ways that the error can be defined, but one of the most common is to define it as
This produces a minimization problem, called a least squares problem
The solution to the least squares problem is computed by solving the normal equation
where denotes the transpose of.
Continuing the example, above, with
we find
and
Solving the normal equation gives
Formulations for Linear Regression
The three main linear least squares formulations are:- Ordinary least squares is the most common estimator. OLS estimates are commonly used to analyze both experimental and observational data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector β: where is a vector whose ith element is the ith observation of the dependent variable, and is the Design matrix whose ij element is the ith observation of the jth independent variable. The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors: where is the transpose of row i of the matrix It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E does not depend on i. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate z that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.
- Weighted least squares are used when heteroscedasticity is present in the error terms of the model.
- Generalized least squares is an extension of the OLS method, that allows efficient estimation of β when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the ith case is inversely proportional to var. This special case of GLS is called "weighted least squares". The GLS solution to an estimation problem is where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.
Alternative formulations
- Iteratively reweighted least squares is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data. In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of β.
- Instrumental variables regression can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary instrumental variables ''zi'' such that E = 0. If Z is the matrix of instruments, then the estimator can be given in closed form as Optimal instruments regression is an extension of classical IV regression to the situation where.
- Total least squares is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free.
- Linear Template Fit combines a linear regression with least squares in order to determine the best estimator. The Linear Template Fit addresses the frequent issue, when the residuals cannot be expressed analytically or are too time-consuming to be evaluate repeatedly, as it is often the case in iterative minimization algorithms. In the Linear Template Fit, the residuals are estimated from the random variables and from a linear approximation of the underlying true model, while the true model needs to be provided for at least distinct reference values β. The true distribution is then approximated by a linear regression, and the best estimators are obtained in closed form as where denotes the template matrix with the values of the known or previously determined model for any of the reference values β, are the random variables, and the matrix and the vector are calculated from the values of β. The LTF can also be expressed for Log-normal distribution distributed random variables. A generalization of the LTF is the Quadratic Template Fit, which assumes a second order regression of the model, requires predictions for at least distinct values β, and it finds the best estimator using Newton's method.
- Percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term.
- Constrained least squares, indicates a linear least squares problem with additional constraints on the solution.
Objective function
where, the latter equality holding since is symmetric and idempotent. It can be shown from this that under an appropriate assignment of weights the expected value of S is. If instead unit weights are assumed, the expected value of S is, where is the variance of each observation.
If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a chi-squared distribution with m − n degrees of freedom. Some illustrative percentile values of are given in the following table.
| 10 | 9.34 | 18.3 | 23.2 |
| 25 | 24.3 | 37.7 | 44.3 |
| 100 | 99.3 | 124 | 136 |
These values can be used for a statistical criterion as to the goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation.
For WLS, the ordinary objective function above is replaced for a weighted average of residuals.
Discussion
In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system.Mathematically, linear least squares is the problem of approximately solving an overdetermined system of linear equations A 'x = b', where b is not an element of the column space of the matrix A. The approximate solution is realized as an exact solution to A 'x = b'', where b' is the projection of b onto the column space of A. The best approximation is then that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called linear least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are convex and have a closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast, non-linear least squares problems generally must be solved by an iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator.
In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of regression analysis. One basic form of such a model is an ordinary least squares model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned. See outline of regression analysis for an outline of the topic.