In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.

## Definition

The centering matrix of size n is defined as the n-by-n matrix

${\displaystyle C_{n}=I_{n}-{\tfrac {1}{n))J_{n))$

where ${\displaystyle I_{n}\,}$ is the identity matrix of size n and ${\displaystyle J_{n))$ is an n-by-n matrix of all 1's.

For example

${\displaystyle C_{1}={\begin{bmatrix}0\end{bmatrix))}$,
${\displaystyle C_{2}=\left[{\begin{array}{rrr}1&0\\0&1\end{array))\right]-{\frac {1}{2))\left[{\begin{array}{rrr}1&1\\1&1\end{array))\right]=\left[{\begin{array}{rrr}{\frac {1}{2))&-{\frac {1}{2))\\-{\frac {1}{2))&{\frac {1}{2))\end{array))\right]}$ ,
${\displaystyle C_{3}=\left[{\begin{array}{rrr}1&0&0\\0&1&0\\0&0&1\end{array))\right]-{\frac {1}{3))\left[{\begin{array}{rrr}1&1&1\\1&1&1\\1&1&1\end{array))\right]=\left[{\begin{array}{rrr}{\frac {2}{3))&-{\frac {1}{3))&-{\frac {1}{3))\\-{\frac {1}{3))&{\frac {2}{3))&-{\frac {1}{3))\\-{\frac {1}{3))&-{\frac {1}{3))&{\frac {2}{3))\end{array))\right]}$

## Properties

Given a column-vector, ${\displaystyle \mathbf {v} \,}$ of size n, the centering property of ${\displaystyle C_{n}\,}$ can be expressed as

${\displaystyle C_{n}\,\mathbf {v} =\mathbf {v} -({\tfrac {1}{n))J_{n,1}^{\textrm {T))\mathbf {v} )J_{n,1))$

where ${\displaystyle J_{n,1))$ is a column vector of ones and ${\displaystyle {\tfrac {1}{n))J_{n,1}^{\textrm {T))\mathbf {v} }$ is the mean of the components of ${\displaystyle \mathbf {v} \,}$.

${\displaystyle C_{n}\,}$ is symmetric positive semi-definite.

${\displaystyle C_{n}\,}$ is idempotent, so that ${\displaystyle C_{n}^{k}=C_{n))$, for ${\displaystyle k=1,2,\ldots }$. Once the mean has been removed, it is zero and removing it again has no effect.

${\displaystyle C_{n}\,}$ is singular. The effects of applying the transformation ${\displaystyle C_{n}\,\mathbf {v} }$ cannot be reversed.

${\displaystyle C_{n}\,}$ has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

${\displaystyle C_{n}\,}$ has a nullspace of dimension 1, along the vector ${\displaystyle J_{n,1))$.

${\displaystyle C_{n}\,}$ is an orthogonal projection matrix. That is, ${\displaystyle C_{n}\mathbf {v} }$ is a projection of ${\displaystyle \mathbf {v} \,}$ onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace ${\displaystyle J_{n,1))$. (This is the subspace of all n-vectors whose components sum to zero.)

The trace of ${\displaystyle C_{n))$ is ${\displaystyle n(n-1)/n=n-1}$.

## Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix ${\displaystyle X}$.

The left multiplication by ${\displaystyle C_{m))$ subtracts a corresponding mean value from each of the n columns, so that each column of the product ${\displaystyle C_{m}\,X}$ has a zero mean. Similarly, the multiplication by ${\displaystyle C_{n))$ on the right subtracts a corresponding mean value from each of the m rows, and each row of the product ${\displaystyle X\,C_{n))$ has a zero mean. The multiplication on both sides creates a doubly centred matrix ${\displaystyle C_{m}\,X\,C_{n))$, whose row and column means are equal to zero.

The centering matrix provides in particular a succinct way to express the scatter matrix, ${\displaystyle S=(X-\mu J_{n,1}^{\mathrm {T} })(X-\mu J_{n,1}^{\mathrm {T} })^{\mathrm {T} ))$ of a data sample ${\displaystyle X\,}$, where ${\displaystyle \mu ={\tfrac {1}{n))XJ_{n,1))$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

${\displaystyle S=X\,C_{n}(X\,C_{n})^{\mathrm {T} }=X\,C_{n}\,C_{n}\,X\,^{\mathrm {T} }=X\,C_{n}\,X\,^{\mathrm {T} }.}$

${\displaystyle C_{n))$ is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are ${\displaystyle k=n}$, and ${\displaystyle p_{1}=p_{2}=\cdots =p_{n}={\frac {1}{n))}$.

## References

1. ^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.