This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Studentized residual" – news · newspapers · books · scholar · JSTOR (May 2015) (Learn how and when to remove this message) This article's factual accuracy is disputed. Relevant discussion may be found on the talk page. Please help to ensure that disputed statements are reliably sourced. (February 2014) (Learn how and when to remove this message) (Learn how and when to remove this message)

Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

In statistics, a studentized residual is the dimensionless ratio resulting from the division of a residual by an estimate of its standard deviation, both expressed in the same units. It is a form of a Student's t-statistic, with the estimate of error varying between points.

This is an important technique in the detection of outliers. It is among several named in honor of William Sealey Gosset, who wrote under the pseudonym "Student" (e.g., Student's distribution). Dividing a statistic by a sample standard deviation is called studentizing, in analogy with standardizing and normalizing.

Motivation

The key reason for studentizing is that, in regression analysis of a multivariate distribution, the variances of the residuals at different input variable values may differ, even if the variances of the errors at these different input variable values are equal. The issue is the difference between errors and residuals in statistics, particularly the behavior of residuals in regressions.

Consider the simple linear regression model

Y=\alpha _{0}+\alpha _{1}X+\varepsilon .\,

Given a random sample (X_i, Y_i), i = 1, ..., n, each pair (X_i, Y_i) satisfies

Y_{i}=\alpha _{0}+\alpha _{1}X_{i}+\varepsilon _{i},\,

where the errors ${\displaystyle \varepsilon _{i))$ , are independent and all have the same variance ${\displaystyle \sigma ^{2))$ . The residuals are not the true errors, but estimates, based on the observable data. When the method of least squares is used to estimate ${\displaystyle \alpha _{0))$ and ${\displaystyle \alpha _{1))$ , then the residuals ${\widehat {\varepsilon \,))$ , unlike the errors $\varepsilon$ , cannot be independent since they satisfy the two constraints

\sum _{i=1}^{n}{\widehat {\varepsilon \,))_{i}=0

and

\sum _{i=1}^{n}{\widehat {\varepsilon \,))_{i}x_{i}=0.

(Here ε_i is the ith error, and ${\displaystyle {\widehat {\varepsilon \,))_{i))$ is the ith residual.)

The residuals, unlike the errors, do not all have the same variance: the variance decreases as the corresponding x-value gets farther from the average x-value. This is not a feature of the data itself, but of the regression better fitting values at the ends of the domain. It is also reflected in the influence functions of various data points on the regression coefficients: endpoints have more influence. This can also be seen because the residuals at endpoints depend greatly on the slope of a fitted line, while the residuals at the middle are relatively insensitive to the slope. The fact that the variances of the residuals differ, even though the variances of the true errors are all equal to each other, is the principal reason for the need for studentization.

It is not simply a matter of the population parameters (mean and standard deviation) being unknown – it is that regressions yield different residual distributions at different data points, unlike point estimators of univariate distributions, which share a common distribution for residuals.

Background

For this simple model, the design matrix is

X=\left[{\begin{matrix}1&x_{1}\\\vdots &\vdots \\1&x_{n}\end{matrix))\right]

and the hat matrix H is the matrix of the orthogonal projection onto the column space of the design matrix:

H=X(X^{T}X)^{-1}X^{T}.\,

The leverage h_ii is the ith diagonal entry in the hat matrix. The variance of the ith residual is

\operatorname {var} ({\widehat {\varepsilon \,))_{i})=\sigma ^{2}(1-h_{ii}).

In case the design matrix X has only two columns (as in the example above), this is equal to

\operatorname {var} ({\widehat {\varepsilon \,))_{i})=\sigma ^{2}\left(1-{\frac {1}{n))-{\frac {(x_{i}-{\bar {x)))^{2)){\sum _{j=1}^{n}(x_{j}-{\bar {x)))^{2))}\right).

In the case of an arithmetic mean, the design matrix X has only one column (a vector of ones), and this is simply:

\operatorname {var} ({\widehat {\varepsilon \,))_{i})=\sigma ^{2}\left(1-{\frac {1}{n))\right).

Calculation

Given the definitions above, the Studentized residual is then

{\displaystyle t_{i}=((\widehat {\varepsilon \,))_{i} \over {\widehat {\sigma )){\sqrt {1-h_{ii}\ ))))

where h_ii is the leverage, where ${\widehat {\sigma ))$ is an appropriate estimate of σ (see below).

In the case of a mean, this is equal to:

{\displaystyle t_{i}=((\widehat {\varepsilon \,))_{i} \over {\widehat {\sigma )){\sqrt {(n-1)/n))))

Internal and external studentization

The usual estimate of σ² is the internally studentized residual

{\widehat {\sigma ))^{2}={1 \over n-m}\sum _{j=1}^{n}{\widehat {\varepsilon \,))_{j}^{\,2}.

where m is the number of parameters in the model (2 in our example).

But if the i th case is suspected of being improbably large, then it would also not be normally distributed. Hence it is prudent to exclude the i th observation from the process of estimating the variance when one is considering whether the i th case may be an outlier, and instead use the externally studentized residual, which is

{\widehat {\sigma ))_{(i)}^{2}={1 \over n-m-1}\sum _{\begin{smallmatrix}j=1\\j\neq i\end{smallmatrix))^{n}{\widehat {\varepsilon \,))_{j}^{\,2},

based on all the residuals except the suspect i th residual. Here is to emphasize that ${\widehat {\varepsilon \,))_{j}^{\,2}(j\neq i)$ for suspect i are computed with i th case excluded.

If the estimate σ² includes the i th case, then it is called the internally studentized residual, ${\displaystyle t_{i))$ (also known as the standardized residual ^[1]). If the estimate ${\displaystyle {\widehat {\sigma ))_{(i)}^{2))$ is used instead, excluding the i th case, then it is called the externally studentized, ${\displaystyle t_{i(i)))$ .

Distribution

"Tau distribution" redirects here. Not to be confused with Tau coefficient.

If the errors are independent and normally distributed with expected value 0 and variance σ², then the probability distribution of the ith externally studentized residual ${\displaystyle t_{i(i)))$ is a Student's t-distribution with n − m − 1 degrees of freedom, and can range from $\scriptstyle -\infty$ to $\scriptstyle +\infty$ .

On the other hand, the internally studentized residuals are in the range $\scriptstyle 0\,\pm \,{\sqrt {\nu ))$ , where ν = n − m is the number of residual degrees of freedom. If t_i represents the internally studentized residual, and again assuming that the errors are independent identically distributed Gaussian variables, then:^[2]

{\displaystyle t_{i}\sim {\sqrt {\nu )){t \over {\sqrt {t^{2}+\nu -1))))

where t is a random variable distributed as Student's t-distribution with ν − 1 degrees of freedom. In fact, this implies that t_i² /ν follows the beta distribution B(1/2,(ν − 1)/2). The distribution above is sometimes referred to as the tau distribution;^[2] it was first derived by Thompson in 1935.^[3]

When ν = 3, the internally studentized residuals are uniformly distributed between $\scriptstyle -{\sqrt {3))$ and $\scriptstyle +{\sqrt {3))$ . If there is only one residual degree of freedom, the above formula for the distribution of internally studentized residuals doesn't apply. In this case, the t_i are all either +1 or −1, with 50% chance for each.

The standard deviation of the distribution of internally studentized residuals is always 1, but this does not imply that the standard deviation of all the t_i of a particular experiment is 1. For instance, the internally studentized residuals when fitting a straight line going through (0, 0) to the points (1, 4), (2, −1), (2, −1) are ${\sqrt {2)),\ -{\sqrt {5))/5,\ -{\sqrt {5))/5$ , and the standard deviation of these is not 1.

Note that any pair of studentized residual t_i and t_j (where $i\neq j$ ), are NOT i.i.d. They have the same distribution, but are not independent due to constraints on the residuals having to sum to 0 and to have them be orthogonal to the design matrix.

Software implementations

Many programs and statistics packages, such as R, Python, etc., include implementations of Studentized residual.

Language/Program	Function	Notes
R	`rstandard(model, ...)`	internally studentized. See [2]
R	`rstudent(model, ...)`	externally studentized. See [3]

Motivation

Background

Calculation

Internal and external studentization

Distribution

Software implementations

See also

References

Further reading