This article's tone or style may not reflect the encyclopedic tone used on Wikipedia. See Wikipedia's guide to writing better articles for suggestions. (April 2023) (Learn how and when to remove this template message)

Part of a series on |

Regression analysis |
---|

Models |

Estimation |

Background |

In statistics, **generalized least squares** (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. Least squares and weighted least squares may need to be more statistically efficient and prevent misleading inferences. GLS was first described by Alexander Aitken in 1935.^{[1]}

In standard linear regression models one observes data on *n* statistical units. The response values are placed in a vector , and the predictor values are placed in the design matrix , where is a vector of the *k* predictor variables (including a constant) for the *i*th unit. The model forces the conditional mean of given to be a linear function of and assumes the conditional variance of the error term given is a *known* nonsingular *covariance matrix* . This is usually written as

Here is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data.

Suppose is a candidate estimate for . Then the residual vector for will be . The generalized least squares method estimates by minimizing the squared Mahalanobis length of this residual vector:

where the last two terms evaluate to scalars, resulting in

This objective is a quadratic form in .

Taking the gradient of this quadratic form with respect to and equating it to zero (when ) gives

Therefore, the minimum of the objective function can be computed yielding the explicit formula:

The quantity is known as the *precision matrix* (or *dispersion matrix*), a generalization of the diagonal weight matrix.

The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with and . GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor , for instance using the Cholesky decomposition. Then if one pre-multiplies both sides of the equation by , we get an equivalent linear model where , , and . In this model , where is the identity matrix. Thus one can efficiently estimate by applying Ordinary least squares (OLS) to the transformed data, which requires minimizing:

This has the effect of standardizing the scale of the errors and “de-correlating” them. When OLS is applied to data with homoscedastic errors, the Gauss–Markov theorem applies, and therefore the GLS estimate is the best linear unbiased estimator for *β*.

Main article: Weighted least squares |

A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of *Ω* are 0. This situation arises when the variances of the observed values are unequal or when heteroscedasticity is present but no correlations exist among the observed variances. The weight for unit *i* is proportional to the reciprocal of the variance of the response for unit *i*.^{[2]}

If the covariance of the errors is unknown, one can get a consistent estimate of , say ,^{[3]} using an implementable version of GLS known as the **feasible generalized least squares** (**FGLS**) estimator.

In FGLS, modeling proceeds in two stages:

(1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to examine the model adding additional constraints; for example, if the errors follow a time series process, a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available); and

(2) using the consistent estimator of the covariance matrix of the errors, one can implement GLS ideas.

Whereas GLS is more efficient than OLS under heteroscedasticity (also spelled heteroskedasticity) or autocorrelation, this is not true for FGLS. The feasible estimator is *asymptotically* more efficient, provided the errors covariance matrix is consistently estimated, but for a small to medium size sample, it can be actually less efficient than OLS. This is why some authors prefer to use OLS, and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation.
However, for large samples FGLS is preferred over OLS under heteroskedasticity or serial correlation.^{[3]}^{[4]} A cautionary note is that the FGLS estimator is not always consistent. One case in which FGLS might be inconsistent is if there are individual specific fixed effects.^{[5]}

In general this estimator has different properties than GLS. For large samples (i.e., asymptotically) all properties are (under appropriate conditions) common with respect to GLS, but for finite samples the properties of FGLS estimators are unknown: they vary dramatically with each particular model, and as a general rule their exact distributions cannot be derived analytically. For finite samples, FGLS may be less efficient than OLS in some cases. Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small. A method used to improve accuracy of the estimators in finite samples is to iterate, i.e., to take the residuals from FGLS to update the errors' covariance estimator and then update the FGLS estimation, applying the same idea iteratively until the estimators vary less than some tolerance. But this method does not necessarily improve the efficiency of the estimator very much if the original sample was small. A reasonable option when samples are not too large is to apply OLS, but discard the classical variance estimator

(which is inconsistent in this framework) and instead use a HAC (Heteroskedasticity and Autocorrelation Consistent) estimator. For example, in the context of autocorrelation we can use the Bartlett estimator (often known as Newey–West estimator since these authors popularized the use of this estimator among econometricians in their 1987 *Econometrica* article), and in heteroscedastic contexts we can use the Eicker–White estimator. This approach is much safer, and it is the appropriate path to take unless the sample is large, where "large" is sometimes a slippery issue (e.g. if the error distribution is asymmetric the required sample will be much larger).

The ordinary least squares (OLS) estimator is calculated by

and estimates of the residuals are constructed.

For simplicity, consider the model for heteroscedastic and non-autocorrelated errors. Assume that the variance-covariance matrix of the error vector is diagonal, or equivalently that errors from distinct observations are uncorrelated. Then each diagonal entry may be estimated by the fitted residuals so may be constructed by

It is important to notice that the squared residuals cannot be used in the previous expression; we need an estimator of the errors' variances. To do so, we can use a parametric heteroskedasticity model, or a nonparametric estimator. Once this step is fulfilled, we can proceed:

Estimate using using^{[4]} weighted least squares

The procedure can be iterated. The first iteration is given by

This estimation of can be iterated to convergence.

Under regularity conditions the FGLS estimator (or the estimator of its iterations, if we iterate a finite number of times) is asymptotically distributed as

where n is the sample size and

here p-lim means limit in probability.