In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size ${\displaystyle n}$, a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size ${\displaystyle (n-1)}$ obtained by omitting one observation.[1]

The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.[2]

The jackknife is a linear approximation of the bootstrap.[2]

## A simple example: mean estimation

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.

For example, if the parameter to be estimated is the population mean of random variable ${\displaystyle x}$, then for a given set of i.i.d. observations ${\displaystyle x_{1},...,x_{n))$ the natural estimator is the sample mean:

${\displaystyle {\bar {x))={\frac {1}{n))\sum _{i=1}^{n}x_{i}={\frac {1}{n))\sum _{i\in [n]}x_{i},}$

where the last sum used another way to indicate that the index ${\displaystyle i}$ runs over the set ${\displaystyle [n]=\{1,\ldots ,n\))$.

Then we proceed as follows: For each ${\displaystyle i\in [n]}$ we compute the mean ${\displaystyle {\bar {x))_{(i)))$ of the jackknife subsample consisting of all but the ${\displaystyle i}$-th data point, and this is called the ${\displaystyle i}$-th jackknife replicate:

${\displaystyle {\bar {x))_{(i)}={\frac {1}{n-1))\sum _{j\in [n],j\neq i}x_{j},\quad \quad i=1,\dots ,n.}$

It could help to think that these ${\displaystyle n}$ jackknife replicates ${\displaystyle {\bar {x))_{(1)},\ldots ,{\bar {x))_{(n)))$ give us an approximation of the distribution of the sample mean ${\displaystyle {\bar {x))}$ and the larger the ${\displaystyle n}$ the better this approximation will be. Then finally to get the jackknife estimator we take the average of these ${\displaystyle n}$ jackknife replicates:

${\displaystyle {\bar {x))_{\mathrm {jack} }={\frac {1}{n))\sum _{i=1}^{n}{\bar {x))_{(i)}.}$

One may ask about the bias and the variance of ${\displaystyle {\bar {x))_{\mathrm {jack} ))$. From the definition of ${\displaystyle {\bar {x))_{\mathrm {jack} ))$ as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of ${\displaystyle {\bar {x))_{\mathrm {jack} ))$ is more involved since the jackknife replicates are not independent.

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

${\displaystyle {\frac {1}{n))\sum _{i=1}^{n}{\bar {x))_{(i)}={\bar {x)).}$

This establishes the identity ${\displaystyle {\bar {x))_{\mathrm {jack} }={\bar {x))}$. Then taking expectations we get ${\displaystyle E[{\bar {x))_{\mathrm {jack} }]=E[{\bar {x))]=E[x]}$, so ${\displaystyle {\bar {x))_{\mathrm {jack} ))$ is unbiased, while taking variance we get ${\displaystyle V[{\bar {x))_{\mathrm {jack} }]=V[{\bar {x))]=V[x]/n}$. However, these properties do not generally hold for parameters other than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

${\displaystyle {\bar {x))_{\mathrm {jack} ))$ could be used to construct an empirical estimate of the bias of ${\displaystyle {\bar {x))}$, namely ${\displaystyle {\widehat {\operatorname {bias} ))({\bar {x)))_{\mathrm {jack} }=c({\bar {x))_{\mathrm {jack} }-{\bar {x)))}$ with some suitable factor ${\displaystyle c>0}$, although in this case we know that ${\displaystyle {\bar {x))_{\mathrm {jack} }={\bar {x))}$ so this construction does not add any meaningful knowledge, but it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of ${\displaystyle {\bar {x))}$ can be calculated from the variance of the jackknife replicates ${\displaystyle {\bar {x))_{(i)))$:[3][4]

${\displaystyle {\widehat {\operatorname {var} ))({\bar {x)))_{\mathrm {jack} }={\frac {n-1}{n))\sum _{i=1}^{n}({\bar {x))_{(i)}-{\bar {x))_{\mathrm {jack} })^{2}={\frac {1}{n(n-1)))\sum _{i=1}^{n}(x_{i}-{\bar {x)))^{2}.}$

The left equality defines the estimator ${\displaystyle {\widehat {\operatorname {var} ))({\bar {x)))_{\mathrm {jack} ))$ and the right equality is an identity that can be verified directly. Then taking expectations we get ${\displaystyle E[{\widehat {\operatorname {var} ))({\bar {x)))_{\mathrm {jack} }]=V[x]/n=V[{\bar {x))]}$, so this is an unbiased estimator of the variance of ${\displaystyle {\bar {x))}$.

## Estimating the bias of an estimator

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample.

Suppose ${\displaystyle \theta }$ is the target parameter of interest, which is assumed to be some functional of the distribution of ${\displaystyle x}$. Based on a finite set of observations ${\displaystyle x_{1},...,x_{n))$, which is assumed to consist of i.i.d. copies of ${\displaystyle x}$, the estimator ${\displaystyle {\hat {\theta ))}$ is constructed:

${\displaystyle {\hat {\theta ))=f_{n}(x_{1},\ldots ,x_{n}).}$

The value of ${\displaystyle {\hat {\theta ))}$ is sample-dependent, so this value will change from one random sample to another.

By definition, the bias of ${\displaystyle {\hat {\theta ))}$ is as follows:

${\displaystyle {\text{bias))({\hat {\theta )))=E[{\hat {\theta ))]-\theta .}$

One may wish to compute several values of ${\displaystyle {\hat {\theta ))}$ from several samples, and average them, to calculate an empirical approximation of ${\displaystyle E[{\hat {\theta ))]}$, but this is impossible when there are no "other samples" when the entire set of available observations ${\displaystyle x_{1},...,x_{n))$ was used to calculate ${\displaystyle {\hat {\theta ))}$. In this kind of situation the jackknife resampling technique may be of help.

We construct the jackknife replicates:

${\displaystyle {\hat {\theta ))_{(1)}=f_{n-1}(x_{2},x_{3}\ldots ,x_{n})}$
${\displaystyle {\hat {\theta ))_{(2)}=f_{n-1}(x_{1},x_{3},\ldots ,x_{n})}$
${\displaystyle \vdots }$
${\displaystyle {\hat {\theta ))_{(n)}=f_{n-1}(x_{1},x_{2},\ldots ,x_{n-1})}$

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points:

${\displaystyle {\hat {\theta ))_{(i)}=f_{n-1}(x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})\quad \quad i=1,\dots ,n.}$

Then we define their average:

${\displaystyle {\hat {\theta ))_{\mathrm {jack} }={\frac {1}{n))\sum _{i=1}^{n}{\hat {\theta ))_{(i)))$

The jackknife estimate of the bias of ${\displaystyle {\hat {\theta ))}$ is given by:

${\displaystyle {\widehat {\text{bias))}({\hat {\theta )))_{\mathrm {jack} }=(n-1)({\hat {\theta ))_{\mathrm {jack} }-{\hat {\theta )))}$

and the resulting bias-corrected jackknife estimate of ${\displaystyle \theta }$ is given by:

${\displaystyle {\hat {\theta ))_{\text{jack))^{*}={\hat {\theta ))-{\widehat {\text{bias))}({\hat {\theta )))_{\mathrm {jack} }=n{\hat {\theta ))-(n-1){\hat {\theta ))_{\mathrm {jack} }.}$

This removes the bias in the special case that the bias is ${\displaystyle O(n^{-1})}$ and reduces it to ${\displaystyle O(n^{-2})}$ in other cases.[2]

## Estimating the variance of an estimator

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.