The overall shape of a protein can be parameterized as a sequence of points on the unit sphere. Shown are two views of the spherical histogram of such points for a large collection of protein structures. The statistical treatment of such data is in the realm of directional statistics.
The fact that 0 degrees and 360 degrees are identical angles, so that for example 180 degrees is not a sensible mean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving temporal periods (e.g. time of day, week, month, year, etc.), compass directions, dihedral angles in molecules, orientations, rotations and so on.
The von Mises distribution is a circular distribution which, like any other circular distribution, may be thought of as a wrapping of a certain linear probability distribution around the circle. The underlying linear probability distribution for the von Mises distribution is mathematically intractable; however, for statistical purposes, there is no need to deal with the underlying linear distribution. The usefulness of the von Mises distribution is twofold: it is the most mathematically tractable of all circular distributions, allowing simpler statistical analysis, and it is a close approximation to the wrapped normal distribution, which, analogously to the linear normal distribution, is important because it is the limiting case for the sum of a large number of small angular deviations. In fact, the von Mises distribution is often known as the "circular normal" distribution because of its ease of use and its close relationship to the wrapped normal distribution (Fisher, 1993).
The Bingham distribution is a distribution over axes in N dimensions, or equivalently, over points on the (N − 1)-dimensional sphere with the antipodes identified. For example, if N = 2, the axes are undirected lines through the origin in the plane. In this case, each axis cuts the unit circle in the plane (which is the one-dimensional sphere) at two points that are each other's antipodes. For N = 4, the Bingham distribution is a distribution over the space of unit quaternions (versors). Since a versor corresponds to a rotation matrix, the Bingham distribution for N = 4 can be used to construct probability distributions over the space of rotations, just like the Matrix-von Mises–Fisher distribution.
The raw vector (or trigonometric) moments of a circular distribution are defined as
where is any interval of length , is the PDF of the circular distribution, and . Since the integral is unity, and the integration interval is finite, it follows that the moments of any circular distribution are always finite and well defined.
Sample moments are analogously defined:
The population resultant vector, length, and mean angle are defined in analogy with the corresponding sample parameters.
In addition, the lengths of the higher moments are defined as:
while the angular parts of the higher moments are just . The lengths of all moments will lie between 0 and 1.
The most common measure of location is the circular mean. The population circular mean is simply the first moment of the distribution while the sample mean is the first moment of the sample. The sample mean will serve as an unbiased estimator of the population mean.
When data is concentrated, the median and mode may be defined by analogy to the linear case, but for more dispersed or multi-modal data, these concepts are not useful.
The circular variance. For the sample the circular variance is defined as:
and for the population
Both will have values between 0 and 1.
The circular standard deviation
with values between 0 and infinity. This definition of the standard deviation (rather than the square root of the variance) is useful because for a wrapped normal distribution, it is an estimator of the standard deviation of the underlying normal distribution. It will therefore allow the circular distribution to be standardized as in the linear case, for small values of the standard deviation. This also applies to the von Mises distribution which closely approximates the wrapped normal distribution. Note that for small , we have .
The circular dispersion
with values between 0 and infinity. This measure of spread is found useful in the statistical analysis of variance.
Distribution of the mean
Given a set of N measurements the mean value of z is defined as:
which may be expressed as
or, alternatively as:
The distribution of the mean angle () for a circular pdf P(θ) will be given by:
where is over any interval of length and the integral is subject to the constraint that and are constant, or, alternatively, that and are constant.
The calculation of the distribution of the mean for most circular distributions is not analytically possible, and in order to carry out an analysis of variance, numerical or mathematical approximations are needed.