Diversity index

From Wikipedia, the free encyclopedia
(Redirected from Simpson index)
Jump to navigation Jump to search

Template:Short description A diversity index is a method of measuring how many different types (e.g. species) there are in a dataset (e.g. a community). Diversity indices are statistical representations of different aspects of biodiversity (e.g. richness, evenness, and dominance), which are useful simplifications for comparing different communities or sites.

When diversity indices are used in ecology, the types of interest are usually species, but they can also be other categories, such as genera, families, functional types, or haplotypes. The entities of interest are usually individual organisms (e.g. plants or animals), and the measure of abundance can be, for example, number of individuals, biomass or coverage. In demography, the entities of interest can be people, and the types of interest various demographic groups. In information science, the entities can be characters and the types of the different letters of the alphabet. The most commonly used diversity indices are simple transformations of the effective number of types (also known as 'true diversity'), but each diversity index can also be interpreted in its own right as a measure corresponding to some real phenomenon (but a different one for each diversity index).[1][2][3][4]

Many indices only account for categorical diversity between subjects or entities. Such indices, however do not account for the total variation (diversity) that can be held between subjects or entities which occurs only when both categorical and qualitative diversity are calculated.

Diversity indices described in this article include:

  • Richness, simply a count of the number of types in a dataset.
  • Shannon index, which also takes into account the proportional abundance of each class under a weighted geometric mean.
    • The Rényi entropy, which adds the ability to freely vary the kind of weighted mean used.
  • Simpson index, which too takes into account the proportional abundance of each class under a weighted arithmetic mean
  • Berger–Parker index, which gives the proportional abundance of the most abundant type.
  • Effective number of species (true diversity), which allows for freely varying the kind of weighted mean used, and has a intuitive meaning.[4]

Some more sophisticated indices also account for the phylogenetic relatedness among the types. These are called phylo-divergence indices, and are not yet described in this article.[5]

Effective number of species or Hill numbers

True diversity, or the effective number of types, refers to the number of equally abundant types needed for the average proportional abundance of the types to equal that observed in the dataset of interest (where all types may not be equally abundant). The true diversity in a dataset is calculated by first taking the weighted generalized mean Mq−1Script error: No such module "Check for unknown parameters". of the proportional abundances of the types in the dataset, and then taking the reciprocal of this. The equation is:[3][4]

qD=1Mq1=1i=1Rpipiq1q1=(i=1Rpiq)1/(1q)

The denominator Mq−1Script error: No such module "Check for unknown parameters". equals the average proportional abundance of the types in the dataset as calculated with the weighted generalized mean with exponent q − 1Script error: No such module "Check for unknown parameters".. In the equation, RScript error: No such module "Check for unknown parameters". is richness (the total number of types in the dataset), and the proportional abundance of the iScript error: No such module "Check for unknown parameters".th type is piScript error: No such module "Check for unknown parameters".. The proportional abundances themselves are used as the nominal weights. The numbers qD are called Hill numbers of order q or effective number of species.[6]

When q = 1Script error: No such module "Check for unknown parameters"., the above equation is undefined. However, the mathematical limit as qScript error: No such module "Check for unknown parameters". approaches 1 is well defined and the corresponding diversity is calculated with the following equation:

1D=1i=1Rpipi=exp(i=1Rpiln(pi))

which is the exponential of the Shannon entropy calculated with natural logarithms (see above). In other domains, this statistic is also known as the perplexity.

The general equation of diversity is often written in the form[1][2]

qD=(i=1Rpiq)1/(1q)

and the term inside the parentheses is called the basic sum. Some popular diversity indices correspond to the basic sum as calculated with different values of qScript error: No such module "Check for unknown parameters"..[2]

Sensitivity of the diversity value to rare vs. abundant species

The value of qScript error: No such module "Check for unknown parameters". is often referred to as the order of the diversity. It defines the sensitivity of the true diversity to rare vs. abundant species by modifying how the weighted mean of the species' proportional abundances is calculated. With some values of the parameter qScript error: No such module "Check for unknown parameters"., the value of the generalized mean Mq−1Script error: No such module "Check for unknown parameters". assumes familiar kinds of weighted means as special cases. In particular,

  • q = 0Script error: No such module "Check for unknown parameters". corresponds to the weighted harmonic mean,
  • q = 1Script error: No such module "Check for unknown parameters". to the weighted geometric mean, and
  • q = 2Script error: No such module "Check for unknown parameters". to the weighted arithmetic mean.
  • As qScript error: No such module "Check for unknown parameters". approaches infinity, the weighted generalized mean with exponent q − 1Script error: No such module "Check for unknown parameters". approaches the maximum piScript error: No such module "Check for unknown parameters". value, which is the proportional abundance of the most abundant species in the dataset.

Generally, increasing the value of qScript error: No such module "Check for unknown parameters". increases the effective weight given to the most abundant species. This leads to obtaining a larger Mq−1Script error: No such module "Check for unknown parameters". value and a smaller true diversity (qDScript error: No such module "Check for unknown parameters".) value with increasing qScript error: No such module "Check for unknown parameters"..

When q = 1Script error: No such module "Check for unknown parameters"., the weighted geometric mean of the piScript error: No such module "Check for unknown parameters". values is used, and each species is exactly weighted by its proportional abundance (in the weighted geometric mean, the weights are the exponents). When q > 1Script error: No such module "Check for unknown parameters"., the weight given to abundant species is exaggerated, and when q < 1Script error: No such module "Check for unknown parameters"., the weight given to rare species is. At q = 0Script error: No such module "Check for unknown parameters"., the species weights exactly cancel out the species proportional abundances, such that the weighted mean of the piScript error: No such module "Check for unknown parameters". values equals 1 / RScript error: No such module "Check for unknown parameters". even when all species are not equally abundant. At q = 0Script error: No such module "Check for unknown parameters"., the effective number of species, 0DScript error: No such module "Check for unknown parameters"., hence equals the actual number of species RScript error: No such module "Check for unknown parameters".. In the context of diversity, qScript error: No such module "Check for unknown parameters". is generally limited to non-negative values. This is because negative values of qScript error: No such module "Check for unknown parameters". would give rare species so much more weight than abundant ones that qDScript error: No such module "Check for unknown parameters". would exceed RScript error: No such module "Check for unknown parameters"..[3][4]

Richness

Script error: No such module "Labelled list hatnote". Richness RScript error: No such module "Check for unknown parameters". simply quantifies how many different types the dataset of interest contains. For example, species richness (usually noted SScript error: No such module "Check for unknown parameters".) is simply the number of species, e.g. at a particular site. Richness is a simple measure, so it has been a popular diversity index in ecology, where abundance data are often not available.[7] If true diversity is calculated with q = 0Script error: No such module "Check for unknown parameters"., the effective number of types (0DScript error: No such module "Check for unknown parameters".) equals the actual number of types, which is identical to Richness (RScript error: No such module "Check for unknown parameters".).[2][4]

Script error: No such module "anchor".Shannon index

The Shannon index has been a popular diversity index in the ecological literature, where it is also known as Shannon's diversity index, Shannon–Wiener index, and (erroneously) Shannon–Weaver index.[8] The measure was originally proposed by Claude Shannon in 1948 to quantify the entropy (hence Shannon entropy, related to Shannon information content) in strings of text.[9] The idea is that the more letters there are, and the closer their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction. It is most often calculated as follows:

H=i=1Rpiln(pi)

where piScript error: No such module "Check for unknown parameters". is the proportion of characters belonging to the iScript error: No such module "Check for unknown parameters".th type of letter in the string of interest. In ecology, piScript error: No such module "Check for unknown parameters". is often the proportion of individuals belonging to the iScript error: No such module "Check for unknown parameters".th species in the dataset of interest. Then the Shannon entropy quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset.

Although the equation is here written with natural logarithms, the base of the logarithm used when calculating the Shannon entropy can be chosen freely. Shannon himself discussed logarithm bases 2, 10 and eScript error: No such module "Check for unknown parameters"., and these have since become the most popular bases in applications that use the Shannon entropy. Each log base corresponds to a different measurement unit, which has been called binary digits (bits), decimal digits (decits), and natural digits (nats) for the bases 2, 10 and eScript error: No such module "Check for unknown parameters"., respectively. Comparing Shannon entropy values that were originally calculated with different log bases requires converting them to the same log base: change from the base aScript error: No such module "Check for unknown parameters". to base bScript error: No such module "Check for unknown parameters". is obtained with multiplication by logb(a)Script error: No such module "Check for unknown parameters"..[9]

The Shannon index (H'Script error: No such module "Check for unknown parameters".) is related to the weighted geometric mean of the proportional abundances of the types. Specifically, it equals the logarithm of true diversity as calculated with q = 1Script error: No such module "Check for unknown parameters".:[3]

H=i=1Rpiln(pi)=i=1Rln(pipi)

This can also be written

H=(ln(p1p1)+ln(p2p2)+ln(p3p3)++ln(pRpR))

which equals

H=ln(p1p1p2p2p3p3pRpR)=ln(1p1p1p2p2p3p3pRpR)=ln(1i=1Rpipi)

Since the sum of the piScript error: No such module "Check for unknown parameters". values equals 1 by definition, the denominator equals the weighted geometric mean of the piScript error: No such module "Check for unknown parameters". values, with the piScript error: No such module "Check for unknown parameters". values themselves being used as the weights (exponents in the equation). The term within the parentheses hence equals true diversity 1DScript error: No such module "Check for unknown parameters"., and H'Script error: No such module "Check for unknown parameters". equals ln(1D)Script error: No such module "Check for unknown parameters"..[1][3][4]

When all types in the dataset of interest are equally common, all piScript error: No such module "Check for unknown parameters". values equal 1 / RScript error: No such module "Check for unknown parameters"., and the Shannon index hence takes the value ln(R)Script error: No such module "Check for unknown parameters".. The more unequal the abundances of the types, the larger the weighted geometric mean of the piScript error: No such module "Check for unknown parameters". values, and the smaller the corresponding Shannon entropy. If practically all abundance is concentrated to one type, and the other types are very rare (even if there are many of them), Shannon entropy approaches zero. When there is only one type in the dataset, Shannon entropy exactly equals zero (there is no uncertainty in predicting the type of the next randomly chosen entity).

In machine learning the Shannon index is also called as Information gain.

Rényi entropy

The Rényi entropy is a generalization of the Shannon entropy to other values of qScript error: No such module "Check for unknown parameters". than 1. It can be expressed:

qH=11qln(i=1Rpiq)

which equals

qH=ln(1i=1Rpipiq1q1)=ln(qD)

This means that taking the logarithm of true diversity based on any value of qScript error: No such module "Check for unknown parameters". gives the Rényi entropy corresponding to the same value of qScript error: No such module "Check for unknown parameters"..

Simpson index

The Simpson index was introduced in 1949 by Edward H. Simpson to measure the degree of concentration when individuals are classified into types.[10] The same index was rediscovered by Orris C. Herfindahl in 1950.[11] The square root of the index had already been introduced in 1945 by the economist Albert O. Hirschman.[12] As a result, the same measure is usually known as the Simpson index in ecology, and as the Herfindahl index or the Herfindahl–Hirschman index (HHI) in economics.

The measure equals the probability that two entities taken at random from the dataset of interest represent the same type.[10] It equals:

λ=i=1Rpi2,

where RScript error: No such module "Check for unknown parameters". is richness (the total number of types in the dataset). This equation is also equal to the weighted arithmetic mean of the proportional abundances piScript error: No such module "Check for unknown parameters". of the types of interest, with the proportional abundances themselves being used as the weights.[1] Proportional abundances are by definition constrained to values between zero and one, but it is a weighted arithmetic mean, hence λ ≥ 1/RScript error: No such module "Check for unknown parameters"., which is reached when all types are equally abundant.

By comparing the equation used to calculate λ with the equations used to calculate true diversity, it can be seen that 1/λScript error: No such module "Check for unknown parameters". equals 2DScript error: No such module "Check for unknown parameters"., i.e., true diversity as calculated with q = 2Script error: No such module "Check for unknown parameters".. The original Simpson's index hence equals the corresponding basic sum.[2]

The interpretation of λ as the probability that two entities taken at random from the dataset of interest represent the same type assumes that the entities are sampled with replacement. If the dataset is very large, sampling without replacement gives approximately the same result, but in small datasets, the difference can be substantial. If the dataset is small, and sampling without replacement is assumed, the probability of obtaining the same type with both random draws is:

=i=1Rni(ni1)N(N1)

where niScript error: No such module "Check for unknown parameters". is the number of entities belonging to the iScript error: No such module "Check for unknown parameters".th type and NScript error: No such module "Check for unknown parameters". is the total number of entities in the dataset.[10] This form of the Simpson index is also known as the Hunter–Gaston index in microbiology.[13]

Since the mean proportional abundance of the types increases with decreasing number of types and increasing abundance of the most abundant type, λ obtains small values in datasets of high diversity and large values in datasets of low diversity. This is counterintuitive behavior for a diversity index, so often, such transformations of λ that increase with increasing diversity have been used instead. The most popular of such indices have been the inverse Simpson index (1/λ) and the Gini–Simpson index (1 − λ).[1][2] Both of these have also been called the Simpson index in the ecological literature, so care is needed to avoid accidentally comparing the different indices as if they were the same.

Inverse Simpson index

The inverse Simpson index equals:

1λ=1i=1Rpi2=2D

This simply equals true diversity of order 2, i.e. the effective number of types that is obtained when the weighted arithmetic mean is used to quantify average proportional abundance of types in the dataset of interest.

The index is also used as a measure of the effective number of parties.

Gini–Simpson index

The Gini-Simpson Index is also called Gini impurity, or Gini's diversity index[14] in the field of Machine Learning. The original Simpson index λ equals the probability that two entities taken at random from the dataset of interest (with replacement) represent the same type. Its transformation 1 − λ, therefore, equals the probability that the two entities represent different types. This measure is also known in ecology as the probability of interspecific encounter (PIE)[15] and the Gini–Simpson index.[2] It can be expressed as a transformation of the true diversity of order 2:

1λ=1i=1Rpi2=112D

The Gibbs–Martin index of sociology, psychology, and management studies,[16] which is also known as the Blau index, is the same measure as the Gini–Simpson index.

The quantity is also known as the expected heterozygosity in population genetics.

Berger–Parker index

The Berger–Parker index, named after Wolfgang H. Berger and Frances Lawrence Parker,[17] equals the maximum piScript error: No such module "Check for unknown parameters". value in the dataset, i.e., the proportional abundance of the most abundant type. This corresponds to the weighted generalized mean of the piScript error: No such module "Check for unknown parameters". values when qScript error: No such module "Check for unknown parameters". approaches infinity, and hence equals the inverse of the true diversity of order infinity (1/DScript error: No such module "Check for unknown parameters".).

See also

Script error: No such module "Template wrapper".Script error: No such module "Check for unknown parameters".

References

<templatestyles src="Reflist/styles.css" />

  1. a b c d e Script error: No such module "Citation/CS1".
  2. a b c d e f g Script error: No such module "Citation/CS1".
  3. a b c d e Script error: No such module "Citation/CS1".
  4. a b c d e f Script error: No such module "Citation/CS1".
  5. Script error: No such module "Citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. Script error: No such module "Citation/CS1".
  8. Spellerberg, Ian F., and Peter J. Fedor. (2003) A tribute to Claude Shannon (1916–2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’Index. Global Ecology and Biogeography 12.3, 177-179.
  9. a b Shannon, C. E. (1948) A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423 and 623–656.
  10. a b c Script error: No such module "Citation/CS1".
  11. Herfindahl, O. C. (1950) Concentration in the U.S. Steel Industry. Unpublished doctoral dissertation, Columbia University.
  12. Hirschman, A. O. (1945) National power and the structure of foreign trade. Berkeley.
  13. Script error: No such module "Citation/CS1".
  14. Script error: No such module "citation/CS1".
  15. Script error: No such module "Citation/CS1".
  16. Script error: No such module "Citation/CS1".
  17. Script error: No such module "Citation/CS1".

Script error: No such module "Check for unknown parameters".

Further reading

  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1". See chapter 5 for an elaboration of coding procedures described informally above.
  • Script error: No such module "Citation/CS1".

External links