The horizontal axis is the index k, the number of occurrences. λ is the expected rate of occurrences. The vertical axis is the probability of k occurrences given λ. The function is defined only at integer values of k; the connecting lines are only guides for the eye.
Cumulative distribution function
The horizontal axis is the index k, the number of occurrences. The CDF is discontinuous at the integers of k and flat everywhere else because a variable that is Poisson distributed takes on only integer values.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.^{[1]} It is named after French mathematician Siméon Denis Poisson (/ˈpwɑːsɒn/; French pronunciation: [pwasɔ̃]). The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume.
It plays an important role for discrete-stable distributions.
For instance, a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution with mean 3: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10.
Another example is the number of decay events that occur from a radioactive source during a defined observation period.
History
The distribution was first introduced by Siméon Denis Poisson (1781–1840) and published together with his probability theory in his work Recherches sur la probabilité des jugements en matière criminelle et en matière civile (1837).^{[2]}^{: 205-207 } The work theorized about the number of wrongful convictions in a given country by focusing on certain random variablesN that count, among other things, the number of discrete occurrences (sometimes called "events" or "arrivals") that take place during a time-interval of given length. The result had already been given in 1711 by Abraham de Moivre in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus .^{[3]}^{: 219 }^{[4]}^{: 14-15 }^{[5]}^{: 193 }^{[6]}^{: 157 } This makes it an example of Stigler's law and it has prompted some authors to argue that the Poisson distribution should bear the name of de Moivre.^{[7]}^{[8]}
In 1860, Simon Newcomb fitted the Poisson distribution to the number of stars found in a unit of space.^{[9]}
A further practical application of this distribution was made by Ladislaus Bortkiewicz in 1898 when he was given the task of investigating the number of soldiers in the Prussian army killed accidentally by horse kicks;^{[10]}^{: 23-25 } this experiment introduced the Poisson distribution to the field of reliability engineering.
The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. The number of such events that occur during a fixed time interval is, under the right circumstances, a random number with a Poisson distribution.
The equation can be adapted if, instead of the average number of events $\lambda ,$ we are given the average rate $r$ at which events occur. Then $\lambda =rt,$ and:^{[13]}
$P(k{\text{ events in interval ))t)={\frac {(rt)^{k}e^{-rt)){k!)).$
Example
The Poisson distribution may be useful to model events such as:
the number of meteorites greater than 1-meter diameter that strike Earth in a year;
the number of laser photons hitting a detector in a particular time interval; and
the number of students achieving a low and high mark in an exam.
Assumptions and validity
The Poisson distribution is an appropriate model if the following assumptions are true:^{[14]}
k is the number of times an event occurs in an interval and k can take values 0, 1, 2, ... .
The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently.
The average rate at which events occur is independent of any occurrences. For simplicity, this is usually assumed to be constant, but may in practice vary with time.
Two events cannot occur at exactly the same instant; instead, at each very small sub-interval, either exactly one event occurs, or no event occurs.
If these conditions are true, then k is a Poisson random variable, and the distribution of k is a Poisson distribution.
The Poisson distribution is also the limit of a binomial distribution, for which the probability of success for each trial equals λ divided by the number of trials, as the number of trials approaches infinity (see Related distributions).
Examples of probability for Poisson distributions
On a particular river, overflow floods occur once every 100 years on average. Calculate the probability of k = 0, 1, 2, 3, 4, 5, or 6 overflow floods in a 100 year interval, assuming the Poisson model is appropriate.
Because the average event rate is one overflow flood per 100 years, λ = 1
$P(k{\text{ overflow floods in 100 years)))={\frac {\lambda ^{k}e^{-\lambda )){k!))={\frac {1^{k}e^{-1)){k!))$
$P(k=0{\text{ overflow floods in 100 years)))={\frac {1^{0}e^{-1)){0!))={\frac {e^{-1)){1))\approx 0.368$
$P(k=1{\text{ overflow flood in 100 years)))={\frac {1^{1}e^{-1)){1!))={\frac {e^{-1)){1))\approx 0.368$
$P(k=2{\text{ overflow floods in 100 years)))={\frac {1^{2}e^{-1)){2!))={\frac {e^{-1)){2))\approx 0.184$
k
P(k overflow floods in 100 years)
0
0.368
1
0.368
2
0.184
3
0.061
4
0.015
5
0.003
6
0.0005
The probability for 0 to 6 overflow floods in a 100 year period.
María Dolores Ugarte and colleagues report that the average number of goals in a World Cup soccer match is approximately 2.5 and the Poisson model is appropriate.^{[15]}
Because the average event rate is 2.5 goals per match, λ = 2.5 .
$P(k{\text{ goals in a match)))={\frac {2.5^{k}e^{-2.5)){k!))$
$P(k=0{\text{ goals in a match)))={\frac {2.5^{0}e^{-2.5)){0!))={\frac {e^{-2.5)){1))\approx 0.082$
$P(k=1{\text{ goal in a match)))={\frac {2.5^{1}e^{-2.5)){1!))={\frac {2.5e^{-2.5)){1))\approx 0.205$
$P(k=2{\text{ goals in a match)))={\frac {2.5^{2}e^{-2.5)){2!))={\frac {6.25e^{-2.5)){2))\approx 0.257$
k
P(k goals in a World Cup soccer match)
0
0.082
1
0.205
2
0.257
3
0.213
4
0.133
5
0.067
6
0.028
7
0.010
The probability for 0 to 7 goals in a match.
Once in an interval events: The special case of λ = 1 and k = 0
Suppose that astronomers estimate that large meteorites (above a certain size) hit the earth on average once every 100 years (λ = 1 event per 100 years), and that the number of meteorite hits follows a Poisson distribution. What is the probability of k = 0 meteorite hits in the next 100 years?
$P(k={\text{0 meteorites hit in next 100 years)))={\frac {1^{0}e^{-1)){0!))={\frac {1}{e))\approx 0.37.$
Under these assumptions, the probability that no large meteorites hit the earth in the next 100 years is roughly 0.37. The remaining 1 − 0.37 = 0.63 is the probability of 1, 2, 3, or more large meteorite hits in the next 100 years.
In an example above, an overflow flood occurred once every 100 years (λ = 1). The probability of no overflow floods in 100 years was roughly 0.37, by the same calculation.
In general, if an event occurs on average once per interval (λ = 1), and the events follow a Poisson distribution, then P(0 events in next interval) = 0.37. In addition, P(exactly one event in next interval) = 0.37, as shown in the table for overflow floods.
Examples that violate the Poisson assumptions
The number of students who arrive at the student union per minute will likely not follow a Poisson distribution, because the rate is not constant (low rate during class time, high rate between class times) and the arrivals of individual students are not independent (students tend to come in groups). The non-constant arrival rate may be modeled as a mixed Poisson distribution, and the arrival of groups rather than individual students as a compound Poisson process.
The number of magnitude 5 earthquakes per year in a country may not follow a Poisson distribution, if one large earthquake increases the probability of aftershocks of similar magnitude.
Examples in which at least one event is guaranteed are not Poisson distributed; but may be modeled using a zero-truncated Poisson distribution.
Count distributions in which the number of intervals with zero events is higher than predicted by a Poisson model may be modeled using a zero-inflated model.
Properties
Descriptive statistics
The expected value and variance of a Poisson-distributed random variable are both equal to λ.
The mode of a Poisson-distributed random variable with non-integer λ is equal to $\lfloor \lambda \rfloor ,$ which is the largest integer less than or equal to λ. This is also written as floor(λ). When λ is a positive integer, the modes are λ and λ − 1.
All of the cumulants of the Poisson distribution are equal to the expected value λ. The n th factorial moment of the Poisson distribution is λ^{ n } .
The expected value of a Poisson process is sometimes decomposed into the product of intensity and exposure (or more generally expressed as the integral of an "intensity function" over time or space, sometimes described as "exposure").^{[16]}
Median
Bounds for the median ($\nu$) of the distribution are known and are sharp:^{[17]}
If $X_{i}\sim \operatorname {Pois} (\lambda _{i})$ for $i=1,\dotsc ,n$ are independent, then ${\textstyle \sum _{i=1}^{n}X_{i}\sim \operatorname {Pois} \left(\sum _{i=1}^{n}\lambda _{i}\right).}$^{[20]}^{: 65 } A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so are each of those two independent random variables.^{[21]}^{[22]}
Maximum Entropy
It is a maximum-entropy distribution among the set of generalized binomial distributions $B_{n}(\lambda )$ with mean $\lambda$ and $n\rightarrow \infty$,^{[23]} where a generalized binomial distribution is defined as a distribution of the sum of N independent but not identically distributed Bernoulli variables.
Other properties
The Poisson distributions are infinitely divisible probability distributions.^{[24]}^{: 233 }^{[6]}^{: 164 }
The directed Kullback–Leibler divergence of $P=\operatorname {Pois} (\lambda _{0})$ from $P_{0}=\operatorname {Pois} (\lambda )$ is given by
If $\lambda \geq 1$ is an integer, then $Y\sim \operatorname {Pois} (\lambda )$ satisfies $\Pr(Y\geq E[Y])\geq {\frac {1}{2))$ and $\Pr(Y\leq E[Y])\geq {\frac {1}{2)).$^{[25]}
Bounds for the tail probabilities of a Poisson random variable $X\sim \operatorname {Pois} (\lambda )$ can be derived using a Chernoff bound argument.^{[26]}^{: 97-98 }
$P(X\geq x)\leq {\frac {(e\lambda )^{x}e^{-\lambda )){x^{x))},{\text{ for ))x>\lambda ,$
$P(X\leq x)\leq {\frac {(e\lambda )^{x}e^{-\lambda )){x^{x))},{\text{ for ))x<\lambda .$
The upper tail probability can be tightened (by a factor of at least two) as follows: ^{[27]}
where $\operatorname {D} _{\text{KL))(Q\parallel P)$ is the Kullback–Leibler divergence of $Q=\operatorname {Pois} (x)$ from $P=\operatorname {Pois} (\lambda )$.
Inequalities that relate the distribution function of a Poisson random variable $X\sim \operatorname {Pois} (\lambda )$ to the Standard normal distribution function $\Phi (x)$ are as follows:^{[28]}
where $\operatorname {D} _{\text{KL))(Q_{-}\parallel P)$ is the Kullback–Leibler divergence of $Q_{-}=\operatorname {Pois} (k)$ from $P=\operatorname {Pois} (\lambda )$ and $\operatorname {D} _{\text{KL))(Q_{+}\parallel P)$ is the Kullback–Leibler divergence of $Q_{+}=\operatorname {Pois} (k+1)$ from $P$.
Poisson races
Let $X\sim \operatorname {Pois} (\lambda )$ and $Y\sim \operatorname {Pois} (\mu )$ be independent random variables, with $\lambda <\mu ,$ then we have that
The upper bound is proved using a standard Chernoff bound.
The lower bound can be proved by noting that $P(X-Y\geq 0\mid X+Y=i)$ is the probability that ${\textstyle Z\geq {\frac {i}{2)),}$ where ${\textstyle Z\sim \operatorname {Bin} \left(i,{\frac {\lambda }{\lambda +\mu ))\right),}$ which is bounded below by ${\textstyle {\frac {1}{(i+1)^{2))}e^{-iD\left(0.5\|{\frac {\lambda }{\lambda +\mu ))\right)},}$ where $D$ is relative entropy (See the entry on bounds on tails of binomial distributions for details). Further noting that $X+Y\sim \operatorname {Pois} (\lambda +\mu ),$ and computing a lower bound on the unconditional probability gives the result. More details can be found in the appendix of Kamath et al..^{[29]}
Related distributions
As a Binomial distribution with infinitesimal time-steps
The Poisson distribution can be derived as a limiting case to the binomial distribution as the number of trials goes to infinity and the expected number of successes remains fixed — see law of rare events below. Therefore, it can be used as an approximation of the binomial distribution if n is sufficiently large and p is sufficiently small. The Poisson distribution is a good approximation of the binomial distribution if n is at least 20 and p is smaller than or equal to 0.05, and an excellent approximation if n ≥ 100 and n p ≤ 10.^{[30]}
If $X_{1}\sim \mathrm {Pois} (\lambda _{1})\,$ and $X_{2}\sim \mathrm {Pois} (\lambda _{2})\,$ are independent, then the difference $Y=X_{1}-X_{2))$ follows a Skellam distribution.
If $X_{1}\sim \mathrm {Pois} (\lambda _{1})\,$ and $X_{2}\sim \mathrm {Pois} (\lambda _{2})\,$ are independent, then the distribution of $X_{1))$ conditional on $X_{1}+X_{2))$ is a binomial distribution. Specifically, if $X_{1}+X_{2}=k,$ then $X_{1}|X_{1}+X_{2}=k\sim \mathrm {Binom} (k,\lambda _{1}/(\lambda _{1}+\lambda _{2})).$ More generally, if X_{1}, X_{2}, ..., X_{n} are independent Poisson random variables with parameters λ_{1}, λ_{2}, ..., λ_{n} then
given $\sum _{j=1}^{n}X_{j}=k,$ it follows that $X_{i}{\Big |}\sum _{j=1}^{n}X_{j}=k\sim \mathrm {Binom} \left(k,{\frac {\lambda _{i)){\sum _{j=1}^{n}\lambda _{j))}\right).$ In fact, $\{X_{i}\}\sim \mathrm {Multinom} \left(k,\left\((\frac {\lambda _{i)){\sum _{j=1}^{n}\lambda _{j))}\right\}\right).$
If $X\sim \mathrm {Pois} (\lambda )\,$ and the distribution of $Y$ conditional on X = k is a binomial distribution, $Y\mid (X=k)\sim \mathrm {Binom} (k,p),$ then the distribution of Y follows a Poisson distribution $Y\sim \mathrm {Pois} (\lambda \cdot p).$ In fact, if, conditional on $\{X=k\},$$\{Y_{i}\))$ follows a multinomial distribution, $\{Y_{i}\}\mid (X=k)\sim \mathrm {Multinom} \left(k,p_{i}\right),$ then each $Y_{i))$ follows an independent Poisson distribution $Y_{i}\sim \mathrm {Pois} (\lambda \cdot p_{i}),\rho (Y_{i},Y_{j})=0.$
The Poisson distribution is a special case of the discrete compound Poisson distribution (or stuttering Poisson distribution) with only a parameter.^{[31]}^{[32]} The discrete compound Poisson distribution can be deduced from the limiting distribution of univariate multinomial distribution. It is also a special case of a compound Poisson distribution.
For sufficiently large values of λ, (say λ>1000), the normal distribution with mean λ and variance λ (standard deviation ${\sqrt {\lambda ))$) is an excellent approximation to the Poisson distribution. If λ is greater than about 10, then the normal distribution is a good approximation if an appropriate continuity correction is performed, i.e., if P(X ≤ x), where x is a non-negative integer, is replaced by P(X ≤ x + 0.5).
Under this transformation, the convergence to normality (as $\lambda$ increases) is far faster than the untransformed variable.^{[citation needed]} Other, slightly more complicated, variance stabilizing transformations are available,^{[6]}^{: 168 } one of which is Anscombe transform.^{[34]} See Data transformation (statistics) for more general uses of transformations.
If for every t > 0 the number of arrivals in the time interval [0, t] follows the Poisson distribution with mean λt, then the sequence of inter-arrival times are independent and identically distributed exponential random variables having mean 1/λ.^{[35]}^{: 317–319 }
This means^{[26]}^{: 101-102 }, among other things, that for any nonnegative function $f(x_{1},x_{2},\dots ,x_{n}),$
if $(Y_{1},Y_{2},\dots ,Y_{n})\sim \operatorname {Mult} (m,\mathbf {p} )$ is multinomially distributed, then
A simple way to generate a bivariate Poisson distribution $X_{1},X_{2))$ is to take three independent Poisson distributions $Y_{1},Y_{2},Y_{3))$ with means $\lambda _{1},\lambda _{2},\lambda _{3))$ and then set $X_{1}=Y_{1}+Y_{3},X_{2}=Y_{2}+Y_{3}.$ The probability function of the bivariate Poisson distribution is
The free Poisson distribution^{[38]} with jump size $\alpha$ and rate $\lambda$ arises in free probability theory as the limit of repeated free convolution
In other words, let $X_{N))$ be random variables so that $X_{N))$ has value $\alpha$ with probability ${\textstyle {\frac {\lambda }{N))}$ and value 0 with the remaining probability. Assume also that the family $X_{1},X_{2},\ldots$ are freely independent. Then the limit as $N\to \infty$ of the law of $X_{1}+\cdots +X_{N))$ is given by the Free Poisson law with parameters $\lambda ,\alpha .$
This definition is analogous to one of the ways in which the classical Poisson distribution is obtained from a (classical) Poisson process.
The measure associated to the free Poisson law is given by^{[39]}
We give values of some important transforms of the free Poisson law; the computation can be found in e.g. in the book Lectures on the Combinatorics of Free Probability by A. Nica and R. Speicher^{[40]}
The R-transform of the free Poisson law is given by
Poisson's probability mass function $f(k;\lambda )$ can be expressed in a form similar to the product distribution of a Weibull distribution and a variant form of the stable count distribution.
The variable $(k+1)$ can be regarded as inverse of Lévy's stability parameter in the stable count distribution:
where ${\mathfrak {N))_{\alpha }(\nu )$ is a standard stable count distribution of shape $\alpha =1/\left(k+1\right),$ and $W_{k+1}(x)$ is a standard Weibull distribution of shape $k+1.$
Given a sample of n measured values $k_{i}\in \{0,1,\dots \},$ for i = 1, ..., n, we wish to estimate the value of the parameter λ of the Poisson population from which the sample was drawn. The maximum likelihood estimate is ^{[41]}
Since each observation has expectation λ so does the sample mean. Therefore, the maximum likelihood estimate is an unbiased estimator of λ. It is also an efficient estimator since its variance achieves the Cramér–Rao lower bound (CRLB).^{[42]} Hence it is minimum-variance unbiased. Also it can be proven that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.
To prove sufficiency we may use the factorization theorem. Consider partitioning the probability mass function of the joint Poisson distribution for the sample into two parts: one that depends solely on the sample $\mathbf {x}$, called $h(\mathbf {x} )$, and one that depends on the parameter $\lambda$ and the sample $\mathbf {x}$ only through the function $T(\mathbf {x} ).$ Then $T(\mathbf {x} )$ is a sufficient statistic for $\lambda .$
The first term $h(\mathbf {x} )$ depends only on $\mathbf {x}$. The second term $g(T(\mathbf {x} )|\lambda )$ depends on the sample only through ${\textstyle T(\mathbf {x} )=\sum _{i=1}^{n}x_{i}.}$ Thus, $T(\mathbf {x} )$ is sufficient.
To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the likelihood function:
So λ is the average of the k_{i} values. Obtaining the sign of the second derivative of L at the stationary point will determine what kind of extreme value λ is.
which is the negative of n times the reciprocal of the average of the k_{i}. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.
For completeness, a family of distributions is said to be complete if and only if $E(g(T))=0$ implies that $P_{\lambda }(g(T)=0)=1$ for all $\lambda .$ If the individual $X_{i))$ are iid $\mathrm {Po} (\lambda ),$ then ${\textstyle T(\mathbf {x} )=\sum _{i=1}^{n}X_{i}\sim \mathrm {Po} (n\lambda ).}$ Knowing the distribution we want to investigate, it is easy to see that the statistic is complete.
For this equality to hold, $g(t)$ must be 0. This follows from the fact that none of the other terms will be 0 for all $t$ in the sum and for all possible values of $\lambda .$ Hence, $E(g(T))=0$ for all $\lambda$ implies that $P_{\lambda }(g(T)=0)=1,$ and the statistic has been shown to be complete.
Confidence interval
The confidence interval for the mean of a Poisson distribution can be expressed using the relationship between the cumulative distribution functions of the Poisson and chi-squared distributions. The chi-squared distribution is itself closely related to the gamma distribution, and this leads to an alternative expression. Given an observation k from a Poisson distribution with mean μ, a confidence interval for μ with confidence level 1 – α is
where $\chi ^{2}(p;n)$ is the quantile function (corresponding to a lower tail area p) of the chi-squared distribution with n degrees of freedom and $F^{-1}(p;n,1)$ is the quantile function of a gamma distribution with shape parameter n and scale parameter 1.^{[6]}^{: 176-178 }^{[43]} This interval is 'exact' in the sense that its coverage probability is never less than the nominal 1 – α.
When quantiles of the gamma distribution are not available, an accurate approximation to this exact interval has been proposed (based on the Wilson–Hilferty transformation):^{[44]}
For application of these formulae in the same context as above (given a sample of n measured values k_{i} each drawn from a Poisson distribution with mean λ), one would set
$k=\sum _{i=1}^{n}k_{i},$
calculate an interval for μ = n λ , and then derive the interval for λ.
It can be shown that gamma distribution is the only prior that induces linearity of the conditional mean. Moreover, a converse result exists which states that if the conditional mean is close to a linear function in the $L_{2))$ distance than the prior distribution of λ must be close to gamma distribution in Levy distance.^{[46]}
The posterior mean E[λ] approaches the maximum likelihood estimate ${\widehat {\lambda ))_{\mathrm {MLE} ))$ in the limit as $\alpha \to 0,\beta \to 0,$ which follows immediately from the general expression of the mean of the gamma distribution.
Suppose $X_{1},X_{2},\dots ,X_{p))$ is a set of independent random variables from a set of $p$ Poisson distributions, each with a parameter $\lambda _{i},$$i=1,\dots ,p,$ and we would like to estimate these parameters. Then, Clevenson and Zidek show that under the normalized squared error loss ${\textstyle L(\lambda ,{\hat {\lambda )))=\sum _{i=1}^{p}\lambda _{i}^{-1}({\hat {\lambda ))_{i}-\lambda _{i})^{2},}$ when $p>1,$ then, similar as in Stein's example for the Normal means, the MLE estimator ${\hat {\lambda ))_{i}=X_{i))$ is inadmissible. ^{[48]}
In this case, a family of minimax estimators is given for any $0<c\leq 2(p-1)$ and $b\geq (p-2+p^{-1})$ as^{[49]}
Biology example: the number of mutations on a strand of DNA per unit length.
Management example: customers arriving at a counter or call centre.
Finance and insurance example: number of losses or claims occurring in a given period of time.
Earthquake seismology example: an asymptotic Poisson model of seismic risk for large earthquakes.^{[52]}
Radioactivity example: number of decays in a given time interval in a radioactive sample.
Optics example: the number of photons emitted in a single laser pulse. This is a major vulnerability to most Quantum key distribution protocols known as Photon Number Splitting (PNS).
The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete properties (that is, those that may happen 0, 1, 2, 3, … times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include:
The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was used in a book by Ladislaus Bortkiewicz (1868–1931).^{[10]}^{: 23-25 }
The number of yeast cells used when brewing Guinness beer. This example was used by William Sealy Gosset (1876–1937).^{[53]}^{[54]}
The number of phone calls arriving at a call centre within a minute. This example was described by A.K. Erlang (1878–1929).^{[55]}
Internet traffic.
The number of goals in sports involving two competing teams.^{[56]}
The number of deaths per year in a given age group.
The number of jumps in a stock price in a given time interval.
Under an assumption of homogeneity, the number of times a web server is accessed per minute.
The number of mutations in a given stretch of DNA after a certain amount of radiation.
The rate of an event is related to the probability of an event occurring in some small subinterval (of time, space or otherwise). In the case of the Poisson distribution, one assumes that there exists a small enough subinterval for which the probability of an event occurring twice is "negligible". With this assumption one can derive the Poisson distribution from the Binomial one, given only the information of expected number of total events in the whole interval.
Let the total number of events in the whole interval be denoted by $\lambda .$ Divide the whole interval into $n$ subintervals $I_{1},\dots ,I_{n))$ of equal size, such that $n>\lambda$ (since we are interested in only very small portions of the interval this assumption is meaningful). This means that the expected number of events in each of the n subintervals is equal to $\lambda /n.$
Now we assume that the occurrence of an event in the whole interval can be seen as a sequence of nBernoulli trials, where the $i$-th Bernoulli trial corresponds to looking whether an event happens at the subinterval $I_{i))$ with probability $\lambda /n.$ The expected number of total events in $n$ such trials would be $\lambda ,$ the expected number of total events in the whole interval. Hence for each subdivision of the interval we have approximated the occurrence of the event as a Bernoulli process of the form ${\textrm {B))(n,\lambda /n).$ As we have noted before we want to consider only very small subintervals. Therefore, we take the limit as $n$ goes to infinity.
In several of the above examples — such as, the number of mutations in a given sequence of DNA—the events being counted are actually the outcomes of discrete trials, and would more precisely be modelled using the binomial distribution, that is
$X\sim {\textrm {B))(n,p).$
In such cases n is very large and p is very small (and so the expectation n p is of intermediate magnitude). Then the distribution may be approximated by the less cumbersome Poisson distribution
$X\sim {\textrm {Pois))(np).$
This approximation is sometimes known as the law of rare events,^{[61]}^{: 5 } since each of the n individual Bernoulli events rarely occurs.
The name "law of rare events" may be misleading because the total count of success events in a Poisson process need not be rare if the parameter n p is not small. For example, the number of telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events appearing frequent to the operator, but they are rare from the point of view of the average member of the population who is very unlikely to make a call to that switchboard in that hour.
The variance of the binomial distribution is 1 − p times that of the Poisson distribution, so almost equal when p is very small.
The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the "law of small numbers" because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898.^{[10]}^{[62]}
The Poisson distribution arises as the number of points of a Poisson point process located in some finite region. More specifically, if D is some region space, for example Euclidean space R^{d}, for which |D|, the area, volume or, more generally, the Lebesgue measure of the region is finite, and if N(D) denotes the number of points in D, then
Poisson regression and negative binomial regression
Poisson regression and negative binomial regression are useful for analyses where the dependent (response) variable is the count (0, 1, 2, ... ) of the number of events or occurrences in an interval.
Other applications in science
In a Poisson process, the number of observed occurrences fluctuates about its mean λ with a standard deviation$\sigma _{k}={\sqrt {\lambda )).$ These fluctuations are denoted as Poisson noise or (particularly in electronics) as shot noise.
The correlation of the mean and standard deviation in counting independent discrete occurrences is useful scientifically. By monitoring how the fluctuations vary with the mean signal, one can estimate the contribution of a single occurrence, even if that contribution is too small to be detected directly. For example, the charge e on an electron can be estimated by correlating the magnitude of an electric current with its shot noise. If N electrons pass a point in a given time t on the average, the meancurrent is $I=eN/t$; since the current fluctuations should be of the order $\sigma _{I}=e{\sqrt {N))/t$ (i.e., the standard deviation of the Poisson process), the charge $e$ can be estimated from the ratio $t\sigma _{I}^{2}/I.$^{[citation needed]}
An everyday example is the graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in the number of reduced silver grains, not to the individual grains themselves. By correlating the graininess with the degree of enlargement, one can estimate the contribution of an individual grain (which is otherwise too small to be seen unaided).^{[citation needed]} Many other molecular applications of Poisson noise have been developed, e.g., estimating the number density of receptor molecules in a cell membrane.
In causal set theory the discrete elements of spacetime follow a Poisson distribution in the volume.
Computational methods
The Poisson distribution poses two different tasks for dedicated software libraries: evaluating the distribution $P(k;\lambda )$, and drawing random numbers according to that distribution.
Evaluating the Poisson distribution
Computing $P(k;\lambda )$ for given $k$ and $\lambda$ is a trivial task that can be accomplished by using the standard definition of $P(k;\lambda )$ in terms of exponential, power, and factorial functions. However, the conventional definition of the Poisson distribution contains two terms that can easily overflow on computers: λ^{k} and k!. The fraction of λ^{k} to k! can also produce a rounding error that is very large compared to e^{−λ}, and therefore give an erroneous result. For numerical stability the Poisson probability mass function should therefore be evaluated as
which is mathematically equivalent but numerically stable. The natural logarithm of the Gamma function can be obtained using the lgamma function in the C standard library (C99 version) or R, the gammaln function in MATLAB or SciPy, or the log_gamma function in Fortran 2008 and later.
Some computing languages provide built-in functions to evaluate the Poisson distribution, namely
Excel: function POISSON( x, mean, cumulative), with a flag to specify the cumulative distribution;
Mathematica: univariate Poisson distribution as PoissonDistribution[$\lambda$],^{[63]} bivariate Poisson distribution as MultivariatePoissonDistribution[$\theta _{12},${ $\theta _{1}-\theta _{12},$$\theta _{2}-\theta _{12))$}],.^{[64]}
A simple algorithm to generate random Poisson-distributed numbers (pseudo-random number sampling) has been given by Knuth:^{[65]}^{: 137-138 }
algorithmpoisson random number (Knuth):
init:
Let L ← e^{−λ}, k ← 0 and p ← 1.
do:
k ← k + 1.
Generate uniform random number u in [0,1] and let p ← p × u.
while p > L.
return k − 1.
The complexity is linear in the returned value k, which is λ on average. There are many other algorithms to improve this. Some are given in Ahrens & Dieter, see § References below.
For large values of λ, the value of L = e^{−λ} may be so small that it is hard to represent. This can be solved by a change to the algorithm which uses an additional parameter STEP such that e^{−STEP} does not underflow:^{[citation needed]}
algorithmpoisson random number (Junhao, based on Knuth):
init:
LetλLeft ← λ, k ← 0 and p ← 1.
do:
k ← k + 1.
Generate uniform random number u in (0,1) and let p ← p × u.
while p < 1 and λLeft > 0:
ifλLeft > STEP:
p ← p × e^{STEP}λLeft ← λLeft − STEP
else:
p ← p × e^{λLeft}λLeft ← 0
while p > 1.
return k − 1.
The choice of STEP depends on the threshold of overflow. For double precision floating point format the threshold is near e^{700}, so 500 should be a safe STEP.
Other solutions for large values of λ include rejection sampling and using Gaussian approximation.
Inverse transform sampling is simple and efficient for small values of λ, and requires only one uniform random number u per sample. Cumulative probabilities are examined in turn until one exceeds u.
algorithmPoisson generator based upon the inversion by sequential search:^{[66]}^{: 505 }init:
Let x ← 0, p ← e^{−λ}, s ← p.
Generate uniform random number u in [0,1].
while u > s do:
x ← x + 1.
p ← p × λ / x.
s ← s + p.
return x.
^
de Moivre, Abraham (1721). "Of the Laws of Chance". In Motte, Benjamin (ed.). The Philosophical Transactions from the Year MDCC (where Mr. Lowthorp Ends) to the Year MDCCXX. Abridg'd, and Dispos'd Under General Heads (in Latin). Vol. I. London, Great Britain: R. Wilkin, R. Robinson, S. Ballard, W. and J. Innys, and J. Osborn. pp. 190–219.
^
Stigler, Stephen M. (1982). "Poisson on the Poisson Distribution". Statistics & Probability Letters. 1 (1): 33–35. doi:10.1016/0167-7152(82)90010-4.
^
Hald, Anders; de Moivre, Abraham; McClintock, Bruce (1984). "A. de Moivre: 'De Mensura Sortis' or 'On the Measurement of Chance'". International Statistical Review / Revue Internationale de Statistique. 52 (3): 229–262. doi:10.2307/1403045. JSTOR1403045.
^ ^{a}^{b}^{c}
von Bortkiewitsch, Ladislaus (1898). Das Gesetz der kleinen Zahlen [The law of small numbers] (in German). Leipzig, Germany: B.G. Teubner. pp. 1, 23–25.
On page 1, Bortkiewicz presents the Poisson distribution.
On pages 23–25, Bortkiewitsch presents his analysis of "4. Beispiel: Die durch Schlag eines Pferdes im preußischen Heere Getöteten." [4. Example: Those killed in the Prussian army by a horse's kick.]
^
Yates, Roy D.; Goodman, David J. (2014). Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers (2nd ed.). Hoboken, NJ: Wiley. ISBN978-0-471-45259-1.
^D. Ahle, Thomas (2022). "Sharp and simple bounds for the raw moments of the Binomial and Poisson distributions". Statistics & Probability Letters. 182: 109306. arXiv:2103.17027. doi:10.1016/j.spl.2021.109306.
^
Lehmann, Erich Leo (1986). Testing Statistical Hypotheses (2nd ed.). New York, NJ, US: Springer Verlag. ISBN978-0-387-94919-2.
^
Raikov, Dmitry (1937). "On the decomposition of Poisson laws". Comptes Rendus de l'Académie des Sciences de l'URSS. 14: 9–11.
^Harremoes, P. (July 2001). "Binomial and Poisson distributions as maximum entropy distributions". IEEE Transactions on Information Theory. 47 (5): 2039–2041. doi:10.1109/18.930936. S2CID16171405.
^
Laha, Radha G.; Rohatgi, Vijay K. (1979). Probability Theory. New York, NJ, US: John Wiley & Sons. ISBN978-0-471-03262-5.
^Mitzenmacher, Michael (2017). Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Eli Upfal (2nd ed.). Cambridge, UK. Exercise 5.14. ISBN978-1-107-15488-9. OCLC960841613.((cite book)): CS1 maint: location missing publisher (link)
^
Kamath, Govinda M.; Şaşoğlu, Eren; Tse, David (14–19 June 2015). Optimal haplotype assembly from high-throughput mate-pair reads. 2015 IEEE International Symposium on Information Theory (ISIT). Hong Kong, China. pp. 914–918. arXiv:1502.01975. doi:10.1109/ISIT.2015.7282588. S2CID128634.
^
Zhang, Huiming; Liu, Yunxiao; Li, Bo (2014). "Notes on discrete compound Poisson model with applications to risk theory". Insurance: Mathematics and Economics. 59: 325–336. doi:10.1016/j.insmatheco.2014.09.012.
^
Zhang, Huiming; Li, Bo (2016). "Characterizations of discrete compound Poisson distributions". Communications in Statistics - Theory and Methods. 45 (22): 6789–6802. doi:10.1080/03610926.2014.901375. S2CID125475756.
^
Loukas, Sotirios; Kemp, C. David (1986). "The Index of Dispersion Test for the Bivariate Poisson Distribution". Biometrics. 42 (4): 941–948. doi:10.2307/2530708. JSTOR2530708.
^Free Random Variables by D. Voiculescu, K. Dykema, A. Nica, CRM Monograph Series, American Mathematical Society, Providence RI, 1992
^
Gelman; Carlin, John B.; Stern, Hal S.; Rubin, Donald B. (2003). Bayesian Data Analysis (2nd ed.). Boca Raton, FL, US: Chapman & Hall/CRC. ISBN1-58488-388-X.
^
Clevenson, M. Lawrence; Zidek, James V. (1975). "Simultaneous estimation of the means of independent Poisson laws". Journal of the American Statistical Association. 70 (351): 698–705. doi:10.1080/01621459.1975.10482497. JSTOR2285958.
^
Flory, Paul J. (1940). "Molecular Size Distribution in Ethylene Oxide Polymers". Journal of the American Chemical Society. 62 (6): 1561–1565. doi:10.1021/ja01863a066.
^
Lomnitz, Cinna (1994). Fundamentals of Earthquake Prediction. New York, NY: John Wiley & Sons. ISBN0-471-57419-8. OCLC647404423.
^
Erlang, Agner K. (1909). "Sandsynlighedsregning og Telefonsamtaler" [Probability Calculation and Telephone Conversations]. Nyt Tidsskrift for Matematik (in Danish). 20 (B): 33–39. JSTOR24528622.