A function with the form of the density function of the Cauchy distribution was studied geometrically by Fermat in 1659, and later was known as the witch of Agnesi, after Agnesi included it as an example in her 1748 calculus textbook. Despite its name, the first explicit analysis of the properties of the Cauchy distribution was published by the French mathematician Poisson in 1824, with Cauchy only becoming associated with it during an academic controversy in 1853. Poisson noted that if the mean of observations following such a distribution were taken, the mean error[further explanation needed] did not converge to any finite number. As such, Laplace's use of the central limit theorem with such distribution was inappropriate, as it assumed a finite mean and variance. Despite this, Poisson did not regard the issue as important, in contrast to Bienaymé, who was to engage Cauchy in a long dispute over the matter.
Like any important probability distribution, or any important concept in mathematics, there are multiple ways to construct the Cauchy distribution family. Here are the most important constructions.
If you stand in front of a line and kick a ball with a direction uniformly at random towards the line, then the distribution of the point where the ball hits the line is a Cauchy distribution.
More formally, consider a point at in the x-y plane, and select a line passing the point, with its direction chosen uniformly at random. The intersection of the line with the x-axis is the Cauchy distribution with location and scale .
This definition gives a simple way to sample from the standard Cauchy distribution. Let be a sample from a uniform distribution from , then we can generate a sample, from the standard Cauchy distribution using
The maximum value or amplitude of the Cauchy PDF is , located at .
It is sometimes convenient to express the PDF in terms of the complex parameter
The special case when and is called the standard Cauchy distribution with the probability density function
In physics, a three-parameter Lorentzian function is often used:
where is the height of the peak. The three-parameter Lorentzian function indicated is not, in general, a probability density function, since it does not integrate to 1, except in the special case where
If are IID samples from the standard Cauchy distribution, then their sample mean is also standard Cauchy distributed. In particular, the average does not converge to the mean, and so the standard Cauchy distribution does not follow the law of large numbers.
This can be proved by repeated integration with the PDF, or more conveniently, by using the characteristic function of standard Cauchy distribution (see below):
With this, we have , and so has a standard Cauchy distribution.
More generally, if are independent and Cauchy distributed with location parameters and scales , and are real numbers, then is Cauchy distributed with location and scale. We see that there is no law of large numbers for any weighted sum of independent Cauchy distributions.
This shows that the condition of finite variance in the central limit theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all stable distributions, of which the Cauchy distribution is a special case.
Central limit theorem
If are IID samples with PDF such that is finite, but nonzero, then converges in distribution to a Cauchy distribution with scale .
Let denote a Cauchy distributed random variable. The characteristic function of the Cauchy distribution is given by
which is just the Fourier transform of the probability density. The original probability density may be expressed in terms of the characteristic function, essentially by using the inverse Fourier transform:
The nth moment of a distribution is the nth derivative of the characteristic function evaluated at . Observe that the characteristic function is not differentiable at the origin: this corresponds to the fact that the Cauchy distribution does not have well-defined moments higher than the zeroth moment.
Compared to the normal distribution, the Cauchy density function has a higher peak and lower tails.
An example is shown in the two figures added here
The figure to the left shows the Cauchy probability density function fitted to an observed histogram. The peak of the function is higher than the peak of the histogram while the tails are lower than those of the histogram.
The figure to the right shows the normal probability density function fitted to the same observed histogram. The peak of the function is lower than the peak of the histogram.
This illustrates the above statement.
The entropy of the Cauchy distribution is given by:
The derivative of the quantile function, the quantile density function, for the Cauchy distribution is:
The Cauchy distribution is usually used as an illustrative counterexample in elementary probability courses, as a distribution with no well-defined (or "indefinite") moments.
If we take IID samples from the standard Cauchy distribution, then the sequence of their sample mean is , which also has the standard Cauchy distribution. Consequently, no matter how many terms we take, the sample average does not converge.
Similarly, the sample variance also does not converge.
A typical trajectory of looks like long periods of slow convergence to zero, punctuated by large jumps away from zero, but never getting too far away. A typical trajectory of looks similar, but the jumps accumulate faster than the decay, diverging to infinity. These two kinds of trajectories are plotted in the figure.
Moments of sample lower than order 1 would converge to zero. Moments of sample higher than order 2 would diverge to infinity even faster than sample variance.
We may evaluate this two-sided improper integral by computing the sum of two one-sided improper integrals. That is,
for an arbitrary real number .
For the integral to exist (even as an infinite value), at least one of the terms in this sum should be finite, or both should be infinite and have the same sign. But in the case of the Cauchy distribution, both the terms in this sum (2) are infinite and have opposite sign. Hence (1) is undefined, and thus so is the mean.
The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example, the raw second moment:
By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to since the two halves of the integral both diverge and have opposite signs. The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the central moments and standardized moments are undefined since they are all based on the mean. The variance—which is the second central moment—is likewise non-existent (despite the fact that the raw second moment exists with the value infinity).
The results for higher moments follow from Hölder's inequality, which implies that higher moments (or halves of moments) diverge if lower ones do.
Moments of truncated distributions
Consider the truncated distribution defined by restricting the standard Cauchy distribution to the interval [−10100, 10100]. Such a truncated distribution has all moments (and the central limit theorem applies for i.i.d. observations from it); yet for almost all practical purposes it behaves like a Cauchy distribution.
Estimation of parameters
Because the parameters of the Cauchy distribution do not correspond to a mean and variance, attempting to estimate the parameters of the Cauchy distribution by using a sample mean and a sample variance will not succeed. For example, if an i.i.d. sample of size n is taken from a Cauchy distribution, one may calculate the sample mean as:
Although the sample values will be concentrated about the central value , the sample mean will become increasingly variable as more observations are taken, because of the increased probability of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the observations themselves; i.e., the sample mean of a large sample is no better (or worse) an estimator of than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more observations are taken.
Therefore, more robust means of estimating the central value and the scaling parameter are needed. One simple method is to take the median value of the sample as an estimator of and half the sample interquartile range as an estimator of . Other, more precise and robust methods have been developed  For example, the truncated mean of the middle 24% of the sample order statistics produces an estimate for that is more efficient than using either the sample median or the full sample mean. However, because of the fat tails of the Cauchy distribution, the efficiency of the estimator decreases if more than 24% of the sample is used.
Maximum likelihood can also be used to estimate the parameters and . However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima. Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples. The log-likelihood function for the Cauchy distribution for sample size is:
Maximizing the log likelihood function with respect to and by taking the first derivative produces the following system of equations:
is a monotone function in and that the solution must satisfy
Solving just for requires solving a polynomial of degree , and solving just for requires solving a polynomial of degree . Therefore, whether solving for one parameter or for both parameters simultaneously, a numerical solution on a computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating using the sample median is only about 81% as asymptotically efficient as estimating by maximum likelihood. The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically efficient an estimator of as the maximum likelihood estimate. When Newton's method is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for .
The shape can be estimated using the median of absolute values, since for location 0 Cauchy variables , the the shape parameter.
Multivariate Cauchy distribution
A random vector is said to have the multivariate Cauchy distribution if every linear combination of its components has a Cauchy distribution. That is, for any constant vector , the random variable should have a univariate Cauchy distribution. The characteristic function of a multivariate Cauchy distribution is given by:
where and are real functions with a homogeneous function of degree one and a positive homogeneous function of degree one. More formally:
for all .
An example of a bivariate Cauchy distribution can be given by:
We also can write this formula for complex variable. Then the probability density function of complex cauchy is :
Like how the standard Cauchy distribution is the Student t-distribution with one degree of freedom, the multidimensional Cauchy density is the multivariate Student distribution with one degree of freedom. The density of a dimension Student distribution with one degree of freedom is:
The properties of multidimensional Cauchy distribution are then special cases of the multivariate Student distribution.
Applications of the Cauchy distribution or its transformation can be found in fields working with exponential growth. A 1958 paper by White  derived the test statistic for estimators of for the equation and where the maximum likelihood estimator is found using ordinary least squares showed the sampling distribution of the statistic is the Cauchy distribution.
The Cauchy distribution is often the distribution of observations for objects that are spinning. The classic reference for this is called the Gull's lighthouse problem and as in the above section as the Breit–Wigner distribution in particle physics.
^Cane, Gwenda J. (1974). "Linear Estimation of Parameters of the Cauchy Distribution Based on Sample Quantiles". Journal of the American Statistical Association. 69 (345): 243–245. doi:10.1080/01621459.1974.10480163. JSTOR2285535.
^ abRothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1964). "A note on estimation from a Cauchy sample". Journal of the American Statistical Association. 59 (306): 460–463. doi:10.1080/01621459.1964.10482170.((cite journal)): CS1 maint: multiple names: authors list (link)