Family of probability distributions often used to model tails or extreme values
Generalized Pareto distribution
Probability density function
GPD distribution functions for
and different values of
Cumulative distribution function
|Method of Moments||
In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location , scale , and shape . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as .
The related location-scale family of distributions is obtained by replacing the argument z by and adjusting the support accordingly.
The cumulative distribution function of (, , and ) is
where the support of is when , and when .
The probability density function (pdf) of is
again, for when , and when .
The pdf is a solution of the following differential equation:
Generating generalized Pareto random variables
Generating GPD random variables
If U is uniformly distributed on
(0, 1], then
Both formulas are obtained by inversion of the cdf.
In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.
GPD as an Exponential-Gamma Mixture
A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.
Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that: must be positive.
Exponentiated generalized Pareto distribution
The exponentiated generalized Pareto distribution (exGPD)
The pdf of the
(exponentiated generalized Pareto distribution) for different values
If , , , then is distributed according to the exponentiated generalized Pareto distribution, denoted by , .
The probability density function(pdf) of , is
where the support is for , and for .
For all , the becomes the location parameter. See the right panel for the pdf when the shape is positive.
The exGPD has finite moments of all orders for all and .
as a function of
. Note that the variance only depends on
. The red dotted line represents the variance evaluated at
, that is,
The moment-generating function of is
where and denote the beta function and gamma function, respectively.
The expected value of , depends on the scale and shape parameters, while the participates through the digamma function:
Note that for a fixed value for the , the plays as the location parameter under the exponentiated generalized Pareto distribution.
The variance of , depends on the shape parameter only through the polygamma function of order 1 (also called the trigamma function):
See the right panel for the variance as a function of . Note that .
Note that the roles of the scale parameter and the shape parameter under are separably interpretable, which may lead to a robust efficient estimation for the than using the . The roles of the two parameters are associated each other under (at least up to the second central moment); see the formula of variance wherein both parameters are participated.
The Hill's estimator
Assume that are observations (not need to be i.i.d.) from an unknown heavy-tailed distribution such that its tail distribution is regularly varying with the tail-index (hence, the corresponding shape parameter is ). To be specific, the tail distribution is described as
It is of a particular interest in the extreme value theory to estimate the shape parameter , especially when is positive (so called the heavy-tailed distribution).
Let be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions , and large , is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate : the GPD plays the key role in POT approach.
A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For , write for the -th largest value of . Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al ) based on the upper order statistics is defined as
In practice, the Hill estimator is used as follows. First, calculate the estimator at each integer , and then plot the ordered pairs . Then, select from the set of Hill estimators which are roughly constant with respect to : these stable values are regarded as reasonable estimates for the shape parameter . If are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter .
Note that the Hill estimator makes a use of the log-transformation for the observations . (The Pickand's estimator also employed the log-transformation, but in a slightly different way