Generalized Pareto distribution

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:Short description Script error: No such module "about". Script error: No such module "Unsubst". Template:Probability distribution

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location μ, scale σ, and shape ξ.[1][2] Sometimes it is specified by only scale and shape[3] and sometimes only by its shape parameter. Some references give the shape parameter as κ=ξ.[4]

With shape ξ>0 and location μ=σ/ξ, the GPD is equivalent to the Pareto distribution with scale xm=σ/ξ and shape α=1/ξ.

Definition

The cumulative distribution function of XGPD(μ,σ,ξ) (μ, σ>0, and ξ) is

F(μ,σ,ξ)(x)={1(1+ξ(xμ)σ)1/ξfor ξ0,1exp(xμσ)for ξ=0,

where the support of X is xμ when ξ0, and μxμσ/ξ when ξ<0.

The probability density function (pdf) of XGPD(μ,σ,ξ) is

f(μ,σ,ξ)(x)=1σ(1+ξ(xμ)σ)(1ξ1),

again, for xμ when ξ0, and μxμσ/ξ when ξ<0.

The pdf is a solution of the following differential equation: Script error: No such module "Unsubst".

{f(x)(μξ+σ+ξx)+(ξ+1)f(x)=0,f(0)=(1μξσ)1ξ1σ}

The standard cumulative distribution function (cdf) of the GPD is defined using z=xμσ [5]

Fξ(z)={1(1+ξz)1/ξfor ξ0,1ezfor ξ=0.

where the support is z0 for ξ0 and 0z1/ξ for ξ<0. The corresponding probability density function (pdf) is

fξ(z)={(1+ξz)ξ+1ξfor ξ0,ezfor ξ=0.

Special cases

Generating generalized Pareto random variables

Generating GPD random variables

If U is uniformly distributed on (0, 1], then

X=μ+σ(Uξ1)ξGPD(μ,σ,ξ0)

and

X=μσln(U)GPD(μ,σ,ξ=0).

Both formulas are obtained by inversion of the cdf.

The Pareto package in R and the gprnd command in the Matlab Statistics Toolbox can be used to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture

A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.

 X | ΛExp(Λ) 

and

 ΛGamma(α, β) 

then

 XGPD( ξ=1/α, σ=β/α ) 

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that  ξ  must be positive.

In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for  YExponential( 1 )  and  ZGamma(1/ξ, 1) , we have  μ+ σ Y  ξ Z GPD(μ, σ, ξ). This is a consequence of the mixture after setting  β=α  and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.

Exponentiated generalized Pareto distribution

The exponentiated generalized Pareto distribution (exGPD)

File:ExGPDpdf.png
The pdf of the exGPD(σ,ξ) (exponentiated generalized Pareto distribution) for different values σ and ξ.

If XGPD (μ=0, σ, ξ ), then Y=log(X) is distributed according to the exponentiated generalized Pareto distribution, denoted by Y exGPD (σ, ξ ).

The probability density function(pdf) of Y exGPD (σ, ξ )(σ>0) is

g(σ,ξ)(y)={eyσ(1+ξeyσ)1/ξ1for ξ0,1σeyey/σfor ξ=0,

where the support is <y< for ξ0, and <ylog(σ/ξ) for ξ<0.

For all ξ, the logσ becomes the location parameter. See the right panel for the pdf when the shape ξ is positive.

The exGPD has finite moments of all orders for all σ>0 and <ξ<.

File:Var exGPD.png
The variance of the exGPD(σ,ξ) as a function of ξ. Note that the variance only depends on ξ. The red dotted line represents the variance evaluated at ξ=0, that is, ψ(1)=π2/6.

The moment-generating function of YexGPD(σ,ξ) is

MY(s)=E[esY]={1ξ(σξ)sB(s+1,1/ξ)for s(1,),ξ<0,1ξ(σξ)sB(s+1,1/ξs)for s(1,1/ξ),ξ>0,σsΓ(1+s)for s(1,),ξ=0,

where B(a,b) and Γ(a) denote the beta function and gamma function, respectively.

The expected value of Y exGPD (σ, ξ ) depends on the scale σ and shape ξ parameters, while the ξ participates through the digamma function:

E[Y]={log (σξ)+ψ(1)ψ(1/ξ+1)for ξ<0,log (σξ)+ψ(1)ψ(1/ξ)for ξ>0,logσ+ψ(1)for ξ=0.

Note that for a fixed value for the ξ(,), the log σ plays as the location parameter under the exponentiated generalized Pareto distribution.

The variance of Y exGPD (σ, ξ ) depends on the shape parameter ξ only through the polygamma function of order 1 (also called the trigamma function):

Var[Y]={ψ(1)ψ(1/ξ+1)for ξ<0,ψ(1)+ψ(1/ξ)for ξ>0,ψ(1)for ξ=0.

See the right panel for the variance as a function of ξ. Note that ψ(1)=π2/61.644934.

Note that the roles of the scale parameter σ and the shape parameter ξ under YexGPD(σ,ξ) are separably interpretable, which may lead to a robust efficient estimation for the ξ than using the XGPD(σ,ξ) [2]. The roles of the two parameters are associated each other under XGPD(μ=0,σ,ξ) (at least up to the second central moment); see the formula of variance Var(X) wherein both parameters are participated.

The Hill's estimator

Assume that X1:n=(X1,,Xn) are n observations (need not be i.i.d.) from an unknown heavy-tailed distribution F such that its tail distribution is regularly varying with the tail-index 1/ξ (hence, the corresponding shape parameter is ξ). To be specific, the tail distribution is described as

F¯(x)=1F(x)=L(x)x1/ξ,for some ξ>0,where L is a slowly varying function.

It is of a particular interest in the extreme value theory to estimate the shape parameter ξ, especially when ξ is positive (so called the heavy-tailed distribution).

Let Fu be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions F, and large u, Fu is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate ξ: the GPD plays the key role in POT approach.

A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For 1in, write X(i) for the i-th largest value of X1,,Xn. Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al [3]) based on the k upper order statistics is defined as

ξ^kHill=ξ^kHill(X1:n)=1k1j=1k1log(X(j)X(k)),for 2kn.

In practice, the Hill estimator is used as follows. First, calculate the estimator ξ^kHill at each integer k{2,,n}, and then plot the ordered pairs {(k,ξ^kHill)}k=2n. Then, select from the set of Hill estimators {ξ^kHill}k=2n which are roughly constant with respect to k: these stable values are regarded as reasonable estimates for the shape parameter ξ. If X1,,Xn are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter ξ [4].

Note that the Hill estimator ξ^kHill makes a use of the log-transformation for the observations X1:n=(X1,,Xn). (The Pickand's estimator ξ^kPickand also employed the log-transformation, but in a slightly different way [5].)

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "Citation/CS1".
  3. Script error: No such module "Citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. Castillo, Enrique, and Ali S. Hadi. "Fitting the generalized Pareto distribution to data." Journal of the American Statistical Association 92.440 (1997): 1609-1620.

Script error: No such module "Check for unknown parameters".

Further reading

  • Script error: No such module "Citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "citation/CS1". Chapter 20, Section 12: Generalized Pareto Distributions.
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".

External links

Template:ProbDistributions