This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (August 2022) (Learn how and when to remove this message)

Geometric
Probability mass function
Cumulative distribution function
Parameters	$0<p\leq 1$ success probability (real)	$0<p\leq 1$ success probability (real)
Support	k trials where ${\displaystyle k\in \{1,2,3,\dots \))$	k failures where ${\displaystyle k\in \{0,1,2,3,\dots \))$
PMF	$(1-p)^{k-1}p$	$(1-p)^{k}p$
CDF	${\displaystyle 1-(1-p)^{\lfloor x\rfloor ))$ for $x\geq 1$ , $0$ for $x<1$	${\displaystyle 1-(1-p)^{\lfloor x\rfloor +1))$ for $x\geq 0$ , $0$ for $x<0$
Mean	${\frac {1}{p))$	${\frac {1-p}{p))$
Median	$\left\lceil {\frac {-1}{\log _{2}(1-p)))\right\rceil$ (not unique if $-1/\log _{2}(1-p)$ is an integer)	$\left\lceil {\frac {-1}{\log _{2}(1-p)))\right\rceil -1$ (not unique if $-1/\log _{2}(1-p)$ is an integer)
Mode	$1$	$0$
Variance	${\displaystyle {\frac {1-p}{p^{2))))$	${\displaystyle {\frac {1-p}{p^{2))))$
Skewness	${\displaystyle {\frac {2-p}{\sqrt {1-p))))$	${\displaystyle {\frac {2-p}{\sqrt {1-p))))$
Excess kurtosis	$6+{\frac {p^{2)){1-p))$	$6+{\frac {p^{2)){1-p))$
Entropy	${\tfrac {-(1-p)\log(1-p)-p\log p}{p))$	${\tfrac {-(1-p)\log(1-p)-p\log p}{p))$
MGF	${\frac {pe^{t)){1-(1-p)e^{t))},$ for $t<-\ln(1-p)$	${\frac {p}{1-(1-p)e^{t))},$ for $t<-\ln(1-p)$
CF	${\displaystyle {\frac {pe^{it)){1-(1-p)e^{it))))$	${\displaystyle {\frac {p}{1-(1-p)e^{it))))$
PGF	${\frac {pz}{1-(1-p)z))$	${\frac {p}{1-(1-p)z))$

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

The probability distribution of the number $X$ of Bernoulli trials needed to get one success, supported on the set ${\displaystyle \{1,2,3,\ldots \))$ ;
The probability distribution of the number $Y=X-1$ of failures before the first success, supported on the set ${\displaystyle \{0,1,2,\ldots \))$ .

Which of these is called the geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of $X$ ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$ . If the probability of success on each trial is $p$ , then the probability that the $k$ -th trial is the first success is

\Pr(X=k)=(1-p)^{k-1}p

for $k=1,2,3,4,\dots$

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

\Pr(Y=k)=\Pr(X=k+1)=(1-p)^{k}p

for $k=0,1,2,3,\dots$

In either case, the sequence of probabilities is a geometric sequence.

Definition

The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function is

P(X=k)=(1-p)^{k-1}p

where

k=1,2,3,\dotsc

is the number of trials and

p

is the probability of success in each trial.^[1]^{: 260–261}

Alternatively, some texts define the distribution where $k=0,1,2,\dotsc$ and call the former the zero-truncated geometric distribution. This alters the probability mass function into:^[2]^: 66

P(Y=k)=(1-p)^{k}p

An example of a geometric distribution arises from rolling a six-sided die until a "1" appears. Each roll is independent with a

1/6

chance of success. The number of rolls needed follows a geometric distribution with

p=1/6

.

Properties

Moments and cumulants

The expected value and variance of a geometrically distributed random variable $X$ defined over $1,2,3,\dotsc$ is^[1]^: 261

\operatorname {E} (X)={\frac {1}{p)),\qquad \operatorname {var} (X)={\frac {1-p}{p^{2))}.

When a geometrically distributed random variable

Y

defined over

0,1,2,\dotsc

, the expected value changes into

\operatorname {E} (Y)={\frac {1-p}{p)),

while the variance stays the same.^[3]

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is ${\frac {1}{1/6))=6$ and the average number of failures is ${\frac {1-1/6}{1/6))=5$ .

The moments for the number of failures before the first success are given by

{\begin{aligned}\mathrm {E} (Y^{n})&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k^{n}\\&{}=p\operatorname {Li} _{-n}(1-p)&({\text{for ))n\neq 0)\end{aligned))

where $\operatorname {Li} _{-n}(1-p)$ is the polylogarithm function.

The cumulants ${\displaystyle \kappa _{n))$ of the probability distribution of Y satisfy the recursion

\kappa _{n+1}=\mu (\mu +1){\frac {d\kappa _{n)){d\mu )).

where $\mu ={\frac {1-p}{p))$ , the expected value of a geometrically distributed random variable defined over $0,1,2,\dotsc$ .

Proof of expected value

Consider the expected value $\mathrm {E} (X)$ of X as above, i.e. the average number of trials until a success. On the first trial, we either succeed with probability $p$ , or we fail with probability $1-p$ . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:

$\mathrm {E} (X)=p\cdot 1+(1-p)\cdot (1+\mathrm {E} (X)),$

which, if solved for $\mathrm {E} (X)$ , gives:

$\mathrm {E} (X)={\frac {1}{p)).$

The expected value of $Y$ can be found from the linearity of expectation, $\mathrm {E} (Y)=\mathrm {E} (X-1)=\mathrm {E} (X)-1={\frac {1}{p))-1={\frac {1-p}{p))$ . It can also be shown in the following way:

${\begin{aligned}\mathrm {E} (Y)&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k\\&{}=p\sum _{k=0}^{\infty }(1-p)^{k}k\\&{}=p(1-p)\sum _{k=0}^{\infty }(1-p)^{k-1}\cdot k\\&{}=p(1-p)\left[{\frac {d}{dp))\left(-\sum _{k=0}^{\infty }(1-p)^{k}\right)\right]\\&{}=p(1-p){\frac {d}{dp))\left(-{\frac {1}{p))\right)={\frac {1-p}{p)).\end{aligned))$

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

General properties

The probability-generating functions of X and Y are, respectively,

{\begin{aligned}G_{X}(s)&={\frac {s\,p}{1-s\,(1-p))),\\[10pt]G_{Y}(s)&={\frac {p}{1-s\,(1-p))),\quad |s|<(1-p)^{-1}.\end{aligned))

Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless. In other words, previous failures will not affect future trials. Expressed in terms of conditional probability, $\Pr(X>m+n|X>n)=\Pr(X>m)$ where $m$ and $n$ are natural numbers. The equality is still true when ≥ is substituted.^[2]^: 71

Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.^[4]
The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any positive integer n, there exist independent identically distributed random variables Y₁, ..., Y_n whose sum has the same distribution that Y has. These will not be geometrically distributed unless n = 1; they follow a negative binomial distribution.
The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.^{[citation needed]} For example, the hundreds digit D has this probability distribution:

\Pr(D=d)={q^{100d} \over 1+q^{100}+q^{200}+\cdots +q^{900)),

where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Golomb coding is the optimal prefix code^{[clarification needed]} for the geometric discrete distribution.^[5]

Related distributions

The sum of $r$ independent geometric random variables with parameter $p$ is a negative binomial random variable with parameters $r$ and $p$ .^[6] The geometric distribution is a special case of the negative binomial distribution, with $r=1$ .

The geometric distribution is a special case of discrete compound Poisson distribution.
The minimum of $n$ geometric random variables with parameters ${\displaystyle p_{1},\dotsc ,p_{n))$ is also geometrically distributed with parameter $1-\prod _{i=1}^{n}(1-p_{i})$ .^[7]

Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X_k has a Poisson distribution with expected value r^k/k. Then

{\displaystyle \sum _{k=1}^{\infty }k\,X_{k))

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value r/(1 − r).^{[citation needed]}

The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter $\lambda$ creates a geometric distribution with parameter ${\displaystyle p=1-e^{-\lambda ))$ defined over $0,1,2,\dotsc$ .^[2]^: 74 This can be used to generate geometrically distributed pseudorandom numbers by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number generator: then $\lfloor \ln(U)/\ln(1-p)\rfloor$ is geometrically distributed with parameter $p$ , if $U$ is uniformly distributed in [0,1].

If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since

{\begin{aligned}\Pr(X/n>a)=\Pr(X>na)&=(1-p)^{na}=\left(1-{\frac {1}{n))\right)^{na}=\left[\left(1-{\frac {1}{n))\right)^{n}\right]^{a}\\&\to [e^{-1}]^{a}=e^{-a}{\text{ as ))n\to \infty .\end{aligned))

More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:

{\displaystyle \Pr(X>nx)=\lim _{n\to \infty }(1-\lambda /n)^{nx}=e^{-\lambda x))

therefore the distribution function of X/n converges to ${\displaystyle 1-e^{-\lambda x))$ , which is that of an exponential random variable.

Statistical inference

Parameter estimation

For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates of p.^[8]^[9]

Specifically, for the first variant let k = k₁, ..., k_n be a sample where k_i ≥ 1 for i = 1, ..., n. Then p can be estimated as

{\widehat {p))=\left({\frac {1}{n))\sum _{i=1}^{n}k_{i}\right)^{-1}={\frac {n}{\sum _{i=1}^{n}k_{i))}.\!

In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is given a Beta(α, β) prior, then the posterior distribution is

p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}(k_{i}-1)\right).\!

The posterior mean E[p] approaches the maximum likelihood estimate ${\widehat {p))$ as α and β approach zero.

In the alternative case, let k₁, ..., k_n be a sample where k_i ≥ 0 for i = 1, ..., n. Then p can be estimated as

{\widehat {p))=\left(1+{\frac {1}{n))\sum _{i=1}^{n}k_{i}\right)^{-1}={\frac {n}{\sum _{i=1}^{n}k_{i}+n)).\!

The posterior distribution of p given a Beta(α, β) prior is^[10]

p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}k_{i}\right).\!

Again the posterior mean E[p] approaches the maximum likelihood estimate ${\widehat {p))$ as α and β approach zero.

For either estimate of ${\widehat {p))$ using Maximum Likelihood, the bias is equal to

b\equiv \operatorname {E} {\bigg [}\;({\hat {p))_{\mathrm {mle} }-p)\;{\bigg ]}={\frac {p\,(1-p)}{n))

which yields the bias-corrected maximum likelihood estimator

{\hat {p\,))_{\text{mle))^{*}={\hat {p\,))_{\text{mle))-{\hat {b\,))

Computational methods

In the programming language R, the function dgeom(k, prob) calculates the probability of k failures before a success with a success probability prob for each trial.

In Microsoft Excel, the function NEGBINOMDIST(number_f, number_s, probability_s) can be used to calculate the number of failures, number_f, before a number of successes, number_s, with a success probability, probability_s, for each trial. Setting number_s to 1, gives the geometric distribution.^[11]

References

External links

Probability distributions (list)

Discrete
univariate

with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric negative Poisson binomial Rademacher soliton discrete uniform Zipf Zipf–Mandelbrot
with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Flory–Schulz Gauss–Kuzmin geometric logarithmic mixed Poisson negative binomial Panjer parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous
univariate

supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular continuous Bernoulli Irwin–Hall Kumaraswamy logit-normal noncentral beta PERT raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi chi-squared noncentral inverse scaled Dagum Davis Erlang hyper exponential hyperexponential hypoexponential logarithmic F noncentral folded normal Fréchet gamma generalized inverse gamma/Gompertz Gompertz shifted half-logistic half-normal Hotelling's T-squared inverse Gaussian generalized Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal log-t Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto phase-type Poly-Weibull Rayleigh relativistic Breit–Wigner Rice truncated normal type-2 Gumbel Weibull discrete Wilks's lambda
supported on the whole real line	Cauchy exponential power Fisher's z Kaniadakis κ-Gaussian Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t Tracy–Widom variance-gamma Voigt
with support whose type varies	generalized chi-squared generalized extreme value generalized Pareto Marchenko–Pastur Kaniadakis κ-exponential Kaniadakis κ-Gamma Kaniadakis κ-Weibull Kaniadakis κ-Logistic Kaniadakis κ-Erlang q-exponential q-Gaussian q-Weibull shifted log-logistic Tukey lambda