Negative binomial distribution: Difference between revisions

Latest revision as of 19:03, 1 November 2025

Template:Short description Template:Negative binomial distribution

In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution,^[1] is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified/constant/fixed number of successes $r$ occur.^[2] For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success ( $r = 3$ ). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes Template:Math, the number of failures Template:Math is random because the number of total trials Template:Math is random. For example, we could use the negative binomial distribution to model the number of days Template:Mvar (random) a certain machine works (specified by Template:Mvar) before it breaks down.

The negative binomial distribution has a variance $μ / p$ , with the distribution becoming identical to Poisson in the limit $p \to 1$ for a given mean $μ$ (i.e. when the failures are increasingly rare). Here $p \in [0, 1]$ is the success probability of each Bernoulli trial. This can make the distribution a useful overdispersed alternative to the Poisson distribution, for example for a robust modification of Poisson regression. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting.^[3] More generally, it may be appropriate where events have positively correlated occurrences causing a larger variance than if the occurrences were independent, due to a positive covariance term.

The term "negative binomial" is likely due to the fact that a certain binomial coefficient that appears in the formula for the probability mass function of the distribution can be written more simply with negative numbers.^[4]

Definitions

Imagine a sequence of independent Bernoulli trials: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is $p$ and of failure is $1 - p$ . We observe this sequence until a predefined number $r$ of successes occurs. Then the random number of observed failures, $X$ , follows the negative binomial distribution: $X \sim NB (r, p)$

Probability mass function

The probability mass function of the negative binomial distribution is $f (k; r, p) \equiv \Pr (X = k) = (\binom{k + r - 1}{k}) (1 - p)^{k} p^{r}$ where Template:Mvar is the number of successes, Template:Mvar is the number of failures, and Template:Mvar is the probability of success on each trial.

Here, the quantity in parentheses is the binomial coefficient, and is equal to $(\binom{k + r - 1}{k}) = \frac{(k + r - 1)!}{(r - 1)! (k)!} = \frac{(k + r - 1) (k + r - 2) \dots (r)}{k!} = \frac{Γ (k + r)}{k! Γ (r)} .$ Note that Template:Math is the Gamma function.

There are Template:Mvar failures chosen from Template:Math trials rather than Template:Math because the last of the Template:Math trials is by definition a success.

This quantity can alternatively be written in the following manner, explaining the name "negative binomial":

$\begin{aligned} \frac{(k + r - 1) \dots (r)}{k!} \\ = & (- 1)^{k} \frac{\overset{k factors}{\overset{⏞}{(- r) (- r - 1) (- r - 2) \dots (- r - k + 1)}}}{k!} = (- 1)^{k} (\binom{- r}{k}) . \end{aligned}$

Note that by the last expression and the binomial series, for every Template:Math and $q = 1 - p$ ,

$p^{- r} = (1 - q)^{- r} = \sum_{k = 0}^{\infty} (\binom{- r}{k}) (- q)^{k} = \sum_{k = 0}^{\infty} (\binom{k + r - 1}{k}) q^{k}$

hence the terms of the probability mass function indeed add up to one as below. $\sum_{k = 0}^{\infty} (\binom{k + r - 1}{k}) {(1 - p)}^{k} p^{r} = p^{- r} p^{r} = 1$

To understand the above definition of the probability mass function, note that the probability for every specific sequence of Template:Mvar successes and Template:Mvar failures is Template:Math, because the outcomes of the Template:Math trials are supposed to happen independently. Since the Template:Mvar-th success always comes last, it remains to choose the Template:Mvar trials with failures out of the remaining Template:Math trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length Template:Math.

Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function:^[2]^[5] $F (k; r, p) \equiv \Pr (X \leq k) = I_{p} (r, k + 1) .$ (This formula is using the same parameterization as in the article's table, with Template:Mvar the number of successes, and $p = r / (r + μ)$ with $μ$ the mean.)

It can also be expressed in terms of the cumulative distribution function of the binomial distribution:^[6] $F (k; r, p) = F_{binomial} (k; n = k + r, 1 - p) .$

Alternative formulations

Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable Template:Mvar is counting different things. These variations can be seen in the table here:

	Template:Mvar is counting...	Probability mass function	Formula	Alternate formula (using equivalent binomial)	Alternate formula (simplified using: $n = k + r$ )	Support
1	Template:Mvar failures, given Template:Mvar successes	$f (k; r, p) \equiv \Pr (X = k) =$	$(\binom{k + r - 1}{k}) p^{r} (1 - p)^{k}$ ^[7]^[5]^[8]	$(\binom{k + r - 1}{r - 1}) p^{r} (1 - p)^{k}$ ^[2]^[9]^[10]^[11]	$(\binom{n - 1}{k}) p^{r} (1 - p)^{k}$	$for k = 0, 1, 2, \dots$
2	Template:Mvar trials, given Template:Mvar successes	$f (n; r, p) \equiv \Pr (X = n) =$	$(\binom{n - 1}{r - 1}) p^{r} (1 - p)^{n - r}$ ^[5]^[11]^[12]^[13]^[14]	$(\binom{n - 1}{n - r}) p^{r} (1 - p)^{n - r}$	$(\binom{n - 1}{k}) p^{r} (1 - p)^{k}$	$for n = r, r + 1, r + 2, \dots$
3	Template:Mvar trials, given Template:Mvar failures	$f (n; r, p) \equiv \Pr (X = n) =$	$(\binom{n - 1}{r - 1}) p^{n - r} (1 - p)^{r}$	$(\binom{n - 1}{n - r}) p^{n - r} (1 - p)^{r}$	$(\binom{n - 1}{k}) p^{k} (1 - p)^{r}$	$for n = r, r + 1, r + 2, \dots$
4	Template:Mvar successes, given Template:Mvar failures	$f (k; r, p) \equiv \Pr (X = k) =$	$(\binom{k + r - 1}{k}) p^{k} (1 - p)^{r}$	$(\binom{k + r - 1}{r - 1}) p^{k} (1 - p)^{r}$	$(\binom{n - 1}{k}) p^{k} (1 - p)^{r}$	$for k = 0, 1, 2, \dots$
-	Template:Mvar successes, given Template:Mvar trials	$f (k; n, p) \equiv \Pr (X = k) =$	This is the binomial distribution not the negative binomial: $(\binom{n}{k}) p^{k} (1 - p)^{n - k} = (\binom{n}{n - k}) p^{k} (1 - p)^{n - k} = (\binom{n}{k}) p^{k} (1 - p)^{r}$			$for k = 0, 1, 2, \dots, n$

Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: $(\binom{a}{b}) = (\binom{a}{a - b}) for 0 \leq b \leq a$ . The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: $n = r + k$ . These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.

The definition where Template:Mvar is the number of Template:Mvar trials that occur for a given number of Template:Mvar successes is similar to the primary definition, except that the number of trials is given instead of the number of failures. This adds Template:Mvar to the value of the random variable, shifting its support and mean.
The definition where Template:Mvar is the number of Template:Mvar successes (or Template:Mvar trials) that occur for a given number of Template:Mvar failures is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given. Note however, that Template:Mvar still refers to the probability of "success".
The definition of the negative binomial distribution can be extended to the case where the parameter Template:Mvar can take on a positive real value. Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function. The problem of extending the definition to real-valued (positive) Template:Mvar boils down to extending the binomial coefficient to its real-valued counterpart, based on the gamma function: $(\binom{k + r - 1}{k}) = \frac{(k + r - 1) (k + r - 2) \dots (r)}{k!} = \frac{Γ (k + r)}{k! Γ (r)}$ After substituting this expression in the original definition, we say that Template:Mvar has a negative binomial (or Pólya) distribution if it has a probability mass function: $f (k; r, p) \equiv \Pr (X = k) = \frac{Γ (k + r)}{k! Γ (r)} (1 - p)^{k} p^{r} for k = 0, 1, 2, \dots$ Here Template:Mvar is a real, positive number.

In negative binomial regression,^[15] the distribution is specified in terms of its mean, $m = \frac{r (1 - p)}{p}$ , which is then related to explanatory variables as in linear regression or other generalized linear models. From the expression for the mean Template:Mvar, one can derive $p = \frac{r}{m + r}$ and $1 - p = \frac{m}{m + r}$ . Then, substituting these expressions in [[#Extension to real-valued r|the one for the probability mass function when Template:Mvar is real-valued]], yields this parametrization of the probability mass function in terms of Template:Mvar:

$\Pr (X = k) = \frac{Γ (r + k)}{k! Γ (r)} {(\frac{r}{r + m})}^{r} {(\frac{m}{r + m})}^{k} for k = 0, 1, 2, \dots$ The variance can then be written as $m + \frac{m^{2}}{r}$ . Some authors prefer to set $α = \frac{1}{r}$ , and express the variance as $m + α m^{2}$ . In this context, and depending on the author, either the parameter Template:Mvar or its reciprocal Template:Mvar is referred to as the "dispersion parameter", "shape parameter" or "clustering coefficient",^[16] or the "heterogeneity"^[15] or "aggregation" parameter.^[10] The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter Template:Mvar towards zero corresponds to increasing aggregation of the organisms; increase of Template:Mvar towards infinity corresponds to absence of aggregation, as can be described by Poisson regression.

Alternative parameterizations

Sometimes the distribution is parameterized in terms of its mean Template:Mvar and variance Template:Math: $\begin{aligned} p = \frac{μ}{σ^{2}}, \\ r = \frac{μ^{2}}{σ^{2} - μ}, \\ \Pr (X = k) = (\binom{k + \frac{μ^{2}}{σ^{2} - μ} - 1}{k}) {(1 - \frac{μ}{σ^{2}})}^{k} {(\frac{μ}{σ^{2}})}^{μ^{2} / (σ^{2} - μ)} \\ E (X) = μ \\ Var (X) = σ^{2} . \end{aligned}$

Another popular parameterization uses Template:Mvar and the failure odds Template:Mvar: $\begin{aligned} p = \frac{1}{1 + β} \\ \Pr (X = k) = (\binom{k + r - 1}{k}) {(\frac{β}{1 + β})}^{k} {(\frac{1}{1 + β})}^{r} \\ E (X) = r β \\ Var (X) = r β (1 + β) . \end{aligned}$

Examples

Length of hospital stay

Hospital length of stay is an example of real-world data that can be modelled well with a negative binomial distribution via negative binomial regression.^[17]^[18]

Selling candy

Pat Collis is required to sell candy bars to raise money for the 6th grade field trip. Pat is (somewhat harshly) not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing.

What's the probability of selling the last candy bar at the Template:Mvar-th house?

Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so Template:Mvar in this case represents the number of failures and Template:Mvar represents the number of successes. Recall that the Template:Math distribution describes the probability of Template:Mvar failures and Template:Mvar successes in Template:Math Template:Math trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e. houses) this takes is therefore Template:Math. The random variable we are interested in is the number of houses, so we substitute Template:Math into a Template:Math mass function and obtain the following mass function of the distribution of houses (for Template:Math):

$f (n) = (\binom{(n - 5) + 5 - 1}{n - 5}) (1 - 0.4)^{5} 0 . 4^{n - 5} = (\binom{n - 1}{n - 5}) 3^{5} \frac{2^{n - 5}}{5^{n}} .$

What's the probability that Pat finishes on the tenth house?

$f (10) = \frac{979776}{9765625} \approx 0.10033 .$

What's the probability that Pat finishes on or before reaching the eighth house?

To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities: $\begin{aligned} f (5) & = \frac{243}{3125} \approx 0.07776 \\ f (6) & = \frac{486}{3125} \approx 0.15552 \\ f (7) & = \frac{2916}{15625} \approx 0.18662 \\ f (8) & = \frac{13608}{78125} \approx 0.17418 \end{aligned}$ $\sum_{j = 5}^{8} f (j) = \frac{46413}{78125} \approx 0.59409 .$

What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?

This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house: $1 - \sum_{j = 5}^{30} f (j) = 1 - I_{0.4} (5, 30 - 5 + 1) \approx 1 - 0.999999823 = 0.000000177 .$

Because of the rather high probability that Pat will sell to each house (60 percent), the probability of her not fulfilling her quest is vanishingly slim.

Properties

Expectation

The expected total number of trials needed to see Template:Mvar successes is $\frac{r}{p}$ . Thus, the expected number of failures would be this value, minus the successes: $E [NB (r, p)] = \frac{r}{p} - r = \frac{r (1 - p)}{p}$

Expectation of successes

The expected total number of failures in a negative binomial distribution with parameters Template:Math is Template:Math. To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until Template:Mvar successes are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: Template:Math and set Template:Math. Now we would expect about Template:Math successes in total. Say the experiment was performed Template:Mvar times. Then there are Template:Math successes in total. So we would expect Template:Math, so Template:Math. See that Template:Math is just the average number of trials per experiment. That is what we mean by "expectation". The average number of failures per experiment is Template:Math. This agrees with the mean given in the box on the right-hand side of this page.

A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let $X_{r} \sim NB (r, p)$ with the convention $X$ represents the number of failures observed before $r$ successes with the probability of success being $p$ . And let $Y_{i} \sim G e o m (p)$ where $Y_{i}$ represents the number of failures before seeing a success. We can think of $Y_{i}$ as the waiting time (number of failures) between the $i$ th and $(i - 1)$ th success. Thus $X_{r} = Y_{1} + Y_{2} + \dots + Y_{r} .$ The mean is $E [X_{r}] = E [Y_{1}] + E [Y_{2}] + \dots + E [Y_{r}] = \frac{r (1 - p)}{p},$ which follows from the fact $E [Y_{i}] = (1 - p) / p$ .

Variance

When counting the number of failures before the Template:Mvar-th success, the variance is Template:Math. When counting the number of successes before the Template:Mvar-th failure, as in alternative formulation (3) above, the variance is Template:Math.

Relation to the binomial theorem

Suppose Template:Mvar is a random variable with a binomial distribution with parameters Template:Mvar and Template:Mvar. Assume Template:Math, with Template:Math, then

$1 = 1^{n} = (p + q)^{n} .$

Using Newton's binomial theorem, this can equally be written as:

$(p + q)^{n} = \sum_{k = 0}^{\infty} (\binom{n}{k}) p^{k} q^{n - k},$

in which the upper bound of summation is infinite. In this case, the binomial coefficient

$(\binom{n}{k}) = \frac{n (n - 1) (n - 2) \dots (n - k + 1)}{k!} .$

is defined when Template:Mvar is a real number, instead of just a positive integer. But in our case of the binomial distribution it is zero when Template:Math. We can then say, for example

$(p + q)^{8.3} = \sum_{k = 0}^{\infty} (\binom{8.3}{k}) p^{k} q^{8.3 - k} .$

Now suppose Template:Math and we use a negative exponent:

$1 = p^{r} \cdot p^{- r} = p^{r} (1 - q)^{- r} = p^{r} \sum_{k = 0}^{\infty} (\binom{- r}{k}) (- q)^{k} .$

Then all of the terms are positive, and the term

$p^{r} (\binom{- r}{k}) (- q)^{k} = (\binom{k + r - 1}{k}) p^{r} q^{k}$

is just the probability that the number of failures before the Template:Mvar-th success is equal to Template:Mvar, provided Template:Mvar is an integer. (If Template:Mvar is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.)

Now we also allow non-integer values of Template:Mvar.

Recall from above that

The sum of independent negative-binomially distributed random variables Template:Math and Template:Math with the same value for parameter Template:Mvar is negative-binomially distributed with the same Template:Mvar but with Template:Mvar-value Template:Math.

This property persists when the definition is thus generalized, and affords a quick way to see that the negative binomial distribution is infinitely divisible.

Recurrence relations

The following recurrence relations hold:

For the probability mass function ${\begin{cases} (k + 1) \Pr (X = k + 1) - (1 - p) \Pr (X = k) (k + r) = 0, \\ \Pr (X = 0) = (1 - p)^{r} . \end{cases}$

For the moments $m_{k} = 𝔼 (X^{k}),$ $m_{k + 1} = r P m_{k} + (P^{2} + P) \frac{d m_{k}}{d P}, P : = (1 - p) / p, m_{0} = 1 .$

For the cumulants $κ_{k + 1} = (Q - 1) Q \frac{d κ_{k}}{d Q}, Q : = 1 / p, κ_{1} = r (Q - 1) .$

Related distributions

The geometric distribution on Template:Math is a special case of the negative binomial distribution, with $Geom (p) = NB (1, p) .$
The negative binomial distribution is a special case of the discrete phase-type distribution.
The negative binomial distribution is a special case of discrete compound Poisson distribution.

Poisson distribution

Consider a sequence of negative binomial random variables where the stopping parameter Template:Mvar goes to infinity, while the probability Template:Mvar of success in each trial goes to one, in such a way as to keep the mean of the distribution (i.e. the expected number of failures) constant. Denoting this mean as Template:Mvar, the parameter Template:Mvar will be Template:Math $\begin{aligned} Mean: & λ = \frac{(1 - p) r}{p} \Rightarrow p = \frac{r}{r + λ}, \\ Variance: & λ (1 + \frac{λ}{r}) > λ, thus always overdispersed . \end{aligned}$

Under this parametrization the probability mass function will be $f (k; r, p) = \frac{Γ (k + r)}{k! \cdot Γ (r)} (1 - p)^{k} p^{r} = \frac{λ^{k}}{k!} \cdot \frac{Γ (r + k)}{Γ (r) (r + λ)^{k}} \cdot \frac{1}{{(1 + \frac{λ}{r})}^{r}}$

Now if we consider the limit as Template:Math, the second factor will converge to one, and the third to the exponent function: $\lim_{r \to \infty} f (k; r, p) = \frac{λ^{k}}{k!} \cdot 1 \cdot \frac{1}{e^{λ}},$ which is the mass function of a Poisson-distributed random variable with expected value Template:Mvar.

In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution and Template:Mvar controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large Template:Mvar, but which has larger variance than the Poisson for small Template:Mvar. $Poisson (λ) = \lim_{r \to \infty} NB (r, \frac{r}{r + λ}) .$

Gamma–Poisson mixture

The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Template:Math distribution, where Template:Mvar is itself a random variable, distributed as a gamma distribution with shape Template:Mvar and scale Template:Math or correspondingly rate Template:Math.

To display the intuition behind this statement, consider two independent Poisson processes, "Success" and "Failure", with intensities Template:Mvar and Template:Math. Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability Template:Mvar; otherwise, it is a failure. If Template:Mvar is a counting number, the coin tosses show that the count of successes before the Template:Mvar-th failure follows a negative binomial distribution with parameters Template:Mvar and Template:Math. The count is also, however, the count of the Success Poisson process at the random time Template:Mvar of the Template:Mvar-th occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean Template:Math, where Template:Mvar is the waiting time for Template:Mvar occurrences in a Poisson process of intensity Template:Math, i.e., Template:Mvar is gamma-distributed with shape parameter Template:Mvar and intensity Template:Math. Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean Template:Math, where the random variate Template:Mvar is gamma-distributed with shape parameter Template:Mvar and intensity Template:Math. The preceding paragraph follows, because Template:Math is gamma-distributed with shape parameter Template:Mvar and intensity Template:Math.

The following formal derivation (which does not depend on Template:Mvar being a counting number) confirms the intuition.

$\begin{aligned} \int_{0}^{\infty} f_{Poisson (λ)} (k) \times f_{Gamma (r, \frac{p}{1 - p})} (λ) d λ \\ = & \int_{0}^{\infty} \frac{λ^{k}}{k!} e^{- λ} \times \frac{1}{Γ (r)} {(\frac{p}{1 - p} λ)}^{r - 1} e^{- \frac{p}{1 - p} λ} (\frac{p}{1 - p}) d λ \\ = & {(\frac{p}{1 - p})}^{r} \frac{1}{k! Γ (r)} \int_{0}^{\infty} λ^{r + k - 1} e^{- λ \frac{p + 1 - p}{1 - p}} d λ \\ = & {(\frac{p}{1 - p})}^{r} \frac{1}{k! Γ (r)} Γ (r + k) (1 - p)^{k + r} \int_{0}^{\infty} f_{Gamma (k + r, \frac{1}{1 - p})} (λ) d λ \\ = & \frac{Γ (r + k)}{k! Γ (r)} (1 - p)^{k} p^{r} \\ = & f (k; r, p) . \end{aligned}$

Because of this, the negative binomial distribution is also known as the gamma–Poisson (mixture) distribution. The negative binomial distribution was originally derived as a limiting case of the gamma-Poisson distribution.^[19]

Distribution of a sum of geometrically distributed random variables

If Template:Math is a random variable following the negative binomial distribution with parameters Template:Mvar and Template:Mvar, and support Template:Math, then Template:Math is a sum of Template:Mvar independent variables following the geometric distribution (on Template:Math) with parameter Template:Mvar. As a result of the central limit theorem, Template:Math (properly scaled and shifted) is therefore approximately normal for sufficiently large Template:Mvar.

Furthermore, if Template:Math is a random variable following the binomial distribution with parameters Template:Math and Template:Mvar, then

$\begin{aligned} \Pr (Y_{r} \leq s) & = 1 - I_{p} (s + 1, r) \\ = 1 - I_{p} ((s + r) - (r - 1), (r - 1) + 1) \\ = 1 - \Pr (B_{s + r} \leq r - 1) \\ = \Pr (B_{s + r} \geq r) \\ = \Pr (after s + r trials, there are at least r successes) . \end{aligned}$

In this sense, the negative binomial distribution is the "inverse" of the binomial distribution.

The sum of independent negative-binomially distributed random variables Template:Math and Template:Math with the same value for parameter Template:Mvar is negative-binomially distributed with the same Template:Mvar but with Template:Mvar-value Template:Math.

The negative binomial distribution is infinitely divisible, i.e., if Template:Mvar has a negative binomial distribution, then for any positive integer Template:Mvar, there exist independent identically distributed random variables Template:Math whose sum has the same distribution that Template:Mvar has.

Representation as compound Poisson distribution

The negative binomial distribution Template:Math can be represented as a compound Poisson distribution: Let $(Y_{n})_{n \in ℕ}$ denote a sequence of independent and identically distributed random variables, each one having the logarithmic series distribution Template:Math, with probability mass function

$f (k; r, p) = \frac{- p^{k}}{k \ln (1 - p)}, k \in ℕ .$

Let Template:Mvar be a random variable, independent of the sequence, and suppose that Template:Mvar has a Poisson distribution with mean Template:Math. Then the random sum

$X = \sum_{n = 1}^{N} Y_{n}$

is Template:Math-distributed. To prove this, we calculate the probability generating function Template:Math of Template:Mvar, which is the composition of the probability generating functions Template:Math and Template:Math. Using

$G_{N} (z) = \exp (λ (z - 1)), z \in ℝ,$

and

$G_{Y_{1}} (z) = \frac{\ln (1 - p z)}{\ln (1 - p)}, | z | < \frac{1}{p},$

we obtain

$\begin{aligned} G_{X} (z) & = G_{N} (G_{Y_{1}} (z)) \\ = \exp [λ (\frac{\ln (1 - p z)}{\ln (1 - p)} - 1)] \\ = \exp [- r (\ln (1 - p z) - \ln (1 - p))] \\ = {(\frac{1 - p}{1 - p z})}^{r}, | z | < \frac{1}{p}, \end{aligned}$

which is the probability generating function of the Template:Math distribution.

The following table describes four distributions related to the number of successes in a sequence of draws:

	With replacements	No replacements
Given number of draws	binomial distribution	hypergeometric distribution
Given number of failures	negative binomial distribution	negative hypergeometric distribution

(a,b,0) class of distributions

The negative binomial, along with the Poisson and binomial distributions, is a member of the [[(a,b,0) class of distributions|Template:Math class of distributions]]. All three of these distributions are special cases of the Panjer distribution. They are also members of a natural exponential family.

Statistical inference

Parameter estimation

MVUE for p

Suppose Template:Mvar is unknown and an experiment is conducted where it is decided ahead of time that sampling will continue until Template:Mvar successes are found. A sufficient statistic for the experiment is Template:Mvar, the number of failures.

In estimating Template:Mvar, the minimum variance unbiased estimator is

$\hat{p} = \frac{r - 1}{r + k - 1} .$

Maximum likelihood estimation

When Template:Mvar is known, the maximum likelihood estimate of Template:Mvar is

$\tilde{p} = \frac{r}{r + k},$

but this is a biased estimate. Its inverse Template:Math, is an unbiased estimate of Template:Math, however.^[20]

When Template:Mvar is unknown, the maximum likelihood estimator for Template:Mvar and Template:Mvar together only exists for samples for which the sample variance is larger than the sample mean.^[21] The likelihood function for Template:Mvar iid observations Template:Math is

$L (r, p) = \prod_{i = 1}^{N} f (k_{i}; r, p)$

from which we calculate the log-likelihood function

$ℓ (r, p) = \sum_{i = 1}^{N} [\ln Γ (k_{i} + r) - \ln (k_{i}!) + k_{i} \ln (1 - p)] + N [r \ln p - \ln Γ (r)] .$

To find the maximum we take the partial derivatives with respect to Template:Mvar and Template:Mvar and set them equal to zero:

$\frac{\partial ℓ (r, p)}{\partial p} = - [\sum_{i = 1}^{N} k_{i} \frac{1}{1 - p}] + N r \frac{1}{p} = 0$ and

$\frac{\partial ℓ (r, p)}{\partial r} = [\sum_{i = 1}^{N} ψ (k_{i} + r)] - N ψ (r) + N \ln (p) = 0$

where

$ψ (k) = \frac{Γ^{'} (k)}{Γ (k)}$ is the digamma function.

Solving the first equation for Template:Mvar gives:

$p = \frac{N r}{N r + \sum_{i = 1}^{N} k_{i}}$

Substituting this in the second equation gives:

$\frac{\partial ℓ (r, p)}{\partial r} = [\sum_{i = 1}^{N} ψ (k_{i} + r)] - N ψ (r) + N \ln (\frac{r}{r + \sum_{i = 1}^{N} k_{i} / N}) = 0$

This equation cannot be solved for Template:Mvar in closed form. If a numerical solution is desired, an iterative technique such as Newton's method can be used. Alternatively, the expectation–maximization algorithm can be used.^[21]

Occurrence and applications

Waiting time in a Bernoulli process

Let Template:Mvar and Template:Mvar be integers with Template:Mvar non-negative and Template:Mvar positive. In a sequence of independent Bernoulli trials with success probability Template:Mvar, the negative binomial gives the probability of Template:Mvar successes and Template:Mvar failures, with a failure on the last trial. Therefore, the negative binomial distribution represents the probability distribution of the number of successes before the Template:Mvar-th failure in a Bernoulli process, with probability Template:Mvar of successes on each trial.

Consider the following example. Suppose we repeatedly throw a die, and consider a 1 to be a failure. The probability of success on each trial is 5/6. The number of successes before the third failure belongs to the infinite set Template:Math. That number of successes is a negative-binomially distributed random variable.

When Template:Math we get the probability distribution of number of successes before the first failure (i.e. the probability of the first failure occurring on the Template:Math-st trial), which is a geometric distribution: $f (k; r, p) = (1 - p) \cdot p^{k}$

Overdispersed Poisson

The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean. See Cumulants of some discrete probability distributions.

An application of this is to annual counts of tropical cyclones in the North Atlantic or to monthly to 6-monthly counts of wintertime extratropical cyclones over Europe, for which the variance is greater than the mean.^[22]^[23]^[24] In the case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson distribution.^[25]^[26]

Negative binomial modeling is widely employed in ecology and biodiversity research for analyzing count data where overdispersion is very common. This is because overdispersion is indicative of biological aggregation, such as species or communities forming clusters. Ignoring overdispersion can lead to significantly inflated model parameters, resulting in misleading statistical inferences. The negative binomial distribution effectively addresses overdispersed counts by permitting the variance to vary quadratically with the mean. An additional dispersion parameter governs the slope of the quadratic term, determining the severity of overdispersion. The model's quadratic mean-variance relationship proves to be a realistic approach for handling overdispersion, as supported by empirical evidence from many studies. Overall, the NB model offers two attractive features: (1) the convenient interpretation of the dispersion parameter as an index of clustering or aggregation, and (2) its tractable form, featuring a closed expression for the probability mass function.^[27]

In genetics, the negative binomial distribution is commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments.^[28]^[29]^[30]^[31]

In epidemiology of infectious diseases, the negative binomial has been used as a better option than the Poisson distribution to model overdispersed counts of secondary infections from one infected case (super-spreading events).^[32]

Multiplicity observations (physics)

The negative binomial distribution has been the most effective statistical model for a broad range of multiplicity observations in particle collision experiments, e.g., $p \bar{p}, h h, h A, A A, e^{+} e^{-}$ ^[33]^[34]^[35]^[36]^[37] (See ^[38] for an overview), and is argued to be a scale-invariant property of matter,^[39]^[40] providing the best fit for astronomical observations, where it predicts the number of galaxies in a region of space.^[41]^[42]^[43]^[44] The phenomenological justification for the effectiveness of the negative binomial distribution in these contexts remained unknown for fifty years, since their first observation in 1973.^[45] In 2023, a proof from first principles was eventually demonstrated by Scott V. Tezlaf, where it was shown that the negative binomial distribution emerges from symmetries in the dynamical equations of a canonical ensemble of particles in Minkowski space.^[46] Roughly, given an expected number of trials $⟨ n ⟩$ and expected number of successes $⟨ r ⟩$ , where

$\begin{aligned} ⟨ 𝓃 ⟩ - ⟨ r ⟩ & = k, & ⟨ p ⟩ & = \frac{⟨ r ⟩}{⟨ 𝓃 ⟩} \\ ⟹ ⟨ 𝓃 ⟩ & = \frac{k}{1 - ⟨ p ⟩}, & ⟨ r ⟩ & = \frac{k ⟨ p ⟩}{1 - ⟨ p ⟩}, \end{aligned}$

an isomorphic set of equations can be identified with the parameters of a relativistic current density of a canonical ensemble of massive particles, via

$\begin{aligned} c^{2} ⟨ ρ^{2} ⟩ - ⟨ j^{2} ⟩ & = c^{2} ρ_{0}^{2}, & ⟨ β_{v}^{2} ⟩ & = \frac{⟨ j^{2} ⟩}{c^{2} ⟨ ρ^{2} ⟩} \\ ⟹ c^{2} ⟨ ρ^{2} ⟩ & = \frac{c^{2} ρ_{0}^{2}}{1 - ⟨ β_{v}^{2} ⟩}, & ⟨ j^{2} ⟩ & = \frac{c^{2} ρ_{0}^{2} ⟨ β_{v}^{2} ⟩}{1 - ⟨ β_{v}^{2} ⟩}, \end{aligned}$

where $ρ_{0}$ is the rest density, $⟨ ρ^{2} ⟩$ is the relativistic mean square density, $⟨ j^{2} ⟩$ is the relativistic mean square current density, and $⟨ β_{v}^{2} ⟩ = ⟨ v^{2} ⟩ / c^{2}$ , where $⟨ v^{2} ⟩$ is the mean square speed of the particle ensemble and $c$ is the speed of light—such that one can establish the following bijective map:

$\begin{aligned} c^{2} ρ_{0}^{2} & \mapsto k, & ⟨ β_{v}^{2} ⟩ & \mapsto ⟨ p ⟩, \\ c^{2} ⟨ ρ^{2} ⟩ & \mapsto ⟨ 𝓃 ⟩, & ⟨ j^{2} ⟩ & \mapsto ⟨ r ⟩ . \end{aligned}$

A rigorous alternative proof of the above correspondence has also been demonstrated through quantum mechanics via the Feynman path integral.^[46]

History

This distribution was first studied in 1713 by Pierre Remond de Montmort in his Essay d'analyse sur les jeux de hazard, as the distribution of the number of trials required in an experiment to obtain a given number of successes.^[47] It had previously been mentioned by Pascal.^[48]

References

Template:Reflist

Template:ProbDistributions

↑ Pascal distribution, Univariate Distribution Relationships, Larry Leemis
↑ ^a ^b ^c Script error: No such module "citation/CS1".
↑ e.g. Script error: No such module "Citation/CS1".
The overdispersion parameter is usually denoted by the letter $k$ in epidemiology, rather than $r$ as here.
↑ Script error: No such module "citation/CS1".
↑ ^a ^b ^c Script error: No such module "citation/CS1".
↑ Morris K W (1963), A note on direct and inverse sampling, Biometrika, 50, 544–545.
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ SAS Institute, "Negative Binomial Distribution", SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition, SAS Institute, Cary, NC, 2016.
↑ ^a ^b Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1". Template:Open access
↑ Script error: No such module "Citation/CS1". Template:Open access
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ ^a ^b Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ ^a ^b Script error: No such module "Citation/CS1".
↑ Montmort PR de (1713) Essai d'analyse sur les jeux de hasard. 2nd ed. Quillau, Paris
↑ Pascal B (1679) Varia Opera Mathematica. D. Petri de Fermat. Tolosae

[1] Pascal distribution, Univariate Distribution Relationships, Larry Leemis

[Wolfram-2] Script error: No such module "citation/CS1".

[3] .g. Script error: No such module "Citation/CS1".
The overdispersion parameter is usually denoted by the letter $k$ in epidemiology, rather than $r$ as here.

[4] Script error: No such module "citation/CS1".

[Cook-5] Script error: No such module "citation/CS1".

[6] Morris K W (1963), A note on direct and inverse sampling, Biometrika, 50, 544–545.

[7] Script error: No such module "citation/CS1".

[8] Script error: No such module "citation/CS1".

[9] SAS Institute, "Negative Binomial Distribution", SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition, SAS Institute, Cary, NC, 2016.

[Crawley_2012-10] Script error: No such module "citation/CS1".

[:0-11] Script error: No such module "citation/CS1".

[12] Script error: No such module "citation/CS1".

[13] Script error: No such module "citation/CS1".

[14] Script error: No such module "citation/CS1".

[neg_bin_reg2-15] Script error: No such module "citation/CS1".

[16] Script error: No such module "Citation/CS1". Template:Open access

[carter-17] Script error: No such module "Citation/CS1". Template:Open access

[18] Script error: No such module "Citation/CS1".

[Greenwood1920-19] Script error: No such module "Citation/CS1".

[20] Script error: No such module "Citation/CS1".

[aramidis1999-21] Script error: No such module "Citation/CS1".

[22] Script error: No such module "Citation/CS1".

[23] Script error: No such module "Citation/CS1".

[24] Script error: No such module "Citation/CS1".

[25] Script error: No such module "citation/CS1".

[26] Script error: No such module "citation/CS1".

[27] Script error: No such module "Citation/CS1".

[28] Script error: No such module "Citation/CS1".

[29] Script error: No such module "citation/CS1".

[30] Script error: No such module "citation/CS1".

[31] Script error: No such module "citation/CS1".

[32] Script error: No such module "Citation/CS1".

[33] Script error: No such module "Citation/CS1".

[34] Script error: No such module "Citation/CS1".

[35] Script error: No such module "Citation/CS1".

[36] Script error: No such module "Citation/CS1".

[37] Script error: No such module "Citation/CS1".

[38] Script error: No such module "citation/CS1".

[39] Script error: No such module "Citation/CS1".

[40] Script error: No such module "Citation/CS1".

[41] Script error: No such module "Citation/CS1".

[42] Script error: No such module "Citation/CS1".

[43] Script error: No such module "Citation/CS1".

[44] Script error: No such module "Citation/CS1".

[45] Script error: No such module "Citation/CS1".

[:1-46] Script error: No such module "Citation/CS1".

[Montmort1713-47] Montmort PR de (1713) Essai d'analyse sur les jeux de hasard. 2nd ed. Quillau, Paris

[Pascal1679-48] Pascal B (1679) Varia Opera Mathematica. D. Petri de Fermat. Tolosae

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

Negative binomial distribution: Difference between revisions

Latest revision as of 19:03, 1 November 2025

Contents

Definitions

Probability mass function

Cumulative distribution function

Alternative formulations

Alternative parameterizations

Examples

Length of hospital stay

Selling candy

Properties

Expectation

Expectation of successes

Variance

Relation to the binomial theorem

Recurrence relations

Related distributions

Poisson distribution

Gamma–Poisson mixture

Distribution of a sum of geometrically distributed random variables

Representation as compound Poisson distribution

(a,b,0) class of distributions

Statistical inference

Parameter estimation

MVUE for p

Maximum likelihood estimation

Occurrence and applications

Waiting time in a Bernoulli process

Overdispersed Poisson

Multiplicity observations (physics)

History

See also

References

Navigation menu

@@ Line 2: / Line 2: @@
 {{Negative binomial distribution}}
-In [[probability theory]] and [[statistics]], the '''negative binomial distribution''' is a [[discrete probability distribution]] that models the number of failures in a sequence of independent and identically distributed [[Bernoulli trial]]s before a specified/constant/fixed number of successes <math>r</math> occur.<ref name="Wolfram">{{cite web |last1=Weisstein |first1=Eric |title=Negative Binomial Distribution |url=https://mathworld.wolfram.com/NegativeBinomialDistribution.html |website=Wolfram MathWorld |publisher=Wolfram Research |access-date=11 October 2020}}</ref> For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success (<math>r=3</math>). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.
+In [[probability theory]] and [[statistics]], the '''negative binomial distribution''', also called a '''Pascal distribution''',<ref>[https://www.math.wm.edu/~leemis/chart/UDR/PDFs/Pascal.pdf Pascal distribution], Univariate Distribution Relationships, Larry Leemis</ref>  is a [[discrete probability distribution]] that models the number of failures in a sequence of independent and identically distributed [[Bernoulli trial]]s before a specified/constant/fixed number of successes <math>r</math> occur.<ref name="Wolfram">{{cite web |last1=Weisstein |first1=Eric |title=Negative Binomial Distribution |url=https://mathworld.wolfram.com/NegativeBinomialDistribution.html |website=Wolfram MathWorld |publisher=Wolfram Research |access-date=11 October 2020}}</ref> For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success (<math>r=3</math>). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.
 An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes {{math|(''r'')}}, the number of failures {{math|(''n'' − ''r'')}} is random because the number of total trials {{math|(''n'')}} is random. For example, we could use the negative binomial distribution to model the number of days {{mvar|n}} (random) a certain machine works (specified by {{mvar|r}}) before it breaks down.
@@ Line 13: / Line 13: @@
 Imagine a sequence of independent [[Bernoulli trial]]s: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We observe this sequence until a predefined number <math>r</math> of successes occurs. Then the random number of observed failures, <math>X</math>, follows the '''negative binomial''' distribution:
-: <math>
+<math display="block">
      X\sim\operatorname{NB}(r, p)
-  </math>
+</math>
 ===Probability mass function===
 The [[probability mass function]] of the negative binomial distribution is
-:<math>
+<math display="block"> f(k; r, p) \equiv \Pr(X = k) = \binom{k+r-1}{k} (1-p)^k p^r </math>
-    f(k; r, p) \equiv \Pr(X = k) = \binom{k+r-1}{k} (1-p)^k p^r
-  </math>
 where {{mvar|r}} is the number of successes, {{mvar|k}} is the number of failures, and {{mvar|p}} is the probability of success on each trial.
 Here, the quantity in parentheses is the [[binomial coefficient]], and is equal to
-:<math>
+<math display="block">
      \binom{k+r-1}{k} = \frac{(k+r-1)!}{(r-1)!\,(k)!} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k + r)}{k!\ \Gamma(r)}.
-  </math>
+</math>
 Note that {{math|Γ(''r'')}} is the [[Gamma function]].
@@ Line 36: / Line 33: @@
 This quantity can alternatively be written in the following manner, explaining the name "negative binomial":
-:<math>
+<math display="block">
 \begin{align}
 & \frac{(k+r-1)\dotsm(r)}{k!} \\[10pt]
@@ Line 45: / Line 42: @@
 Note that by the last expression and the [[binomial series]], for every {{math|0 ≤ ''p'' < 1}} and <math>q=1-p</math>,
-:<math>
+<math display="block">
 p^{-r} = (1-q)^{-r} = \sum_{k=0}^\infty \binom{-r}{\phantom{-}k}(-q)^k = \sum_{k=0}^\infty \binom{k+r-1}{k}q^k
 </math>
 hence the terms of the probability mass function indeed add up to one as below.
-:<math>
+<math display="block">
-\sum_{k=0}^\infty \binom{k+r-1}{k}(1-p)^kp^r = p^{-r}p^r = 1
+\sum_{k=0}^\infty \binom{k+r-1}{k} \left(1-p\right)^k p^r = p^{-r}p^r = 1
 </math>
@@ Line 59: / Line 56: @@
 The [[cumulative distribution function]] can be expressed in terms of the [[regularized incomplete beta function]]:<ref name="Wolfram" /><ref name="Cook" />
-: <math>
+<math display="block">
      F(k; r, p) \equiv \Pr(X\le k) = I_{p}(r, k+1).
-  </math>
+</math>
-(This formula is using the same parameterization as in the article's table, with {{mvar|r}} the number of successes, and <math>p=r/(r+\mu)</math> with <math>\mu</math> the mean.)
+(This formula is using the same parameterization as in the article's table, with {{mvar|r}} the number of successes, and <math>p = r/(r+\mu)</math> with <math>\mu</math> the mean.)
-It can also be expressed in terms of the [[cumulative distribution function]] of the [[binomial distribution]]:<ref>Morris K W (1963),A note on direct and inverse sampling, Biometrika, 50, 544–545.</ref>
+It can also be expressed in terms of the [[cumulative distribution function]] of the [[binomial distribution]]:<ref>Morris K W (1963), A note on direct and inverse sampling, Biometrika, 50, 544–545.</ref>
-: <math>
+<math display="block">
-     F(k; r, p) = F_\text{binomial}(k;n=k+r,1-p).
+     F(k; r, p) = F_\text{binomial}(k;n=k+r,1-p). </math>
-  </math>
 ===Alternative formulations===
 Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable {{mvar|X}} is counting different things. These variations can be seen in the table here:
+<div class="overflowbugx" style="overflow-x:auto;">
 {| class="wikitable"
 |
@@ Line 79: / Line 77: @@
 (using equivalent binomial)
 !Alternate formula
-(simplified using: <math display="inline">n=k+r
+(simplified using: <math display="inline">n = k + r </math>)
-</math>)
 !Support
 |-
 |1
 |{{mvar|k}} failures, given {{mvar|r}} successes
-|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
+|<math display="inline">f(k; r, p) \equiv \Pr(X = k) = </math>
-</math>
+|<math display="inline">\binom{k+r-1}{k} p^r(1-p)^k </math><ref>{{Cite web|url=http://www.mathworks.com/help/stats/negative-binomial-distribution.html|title=Mathworks: Negative Binomial Distribution}}</ref><ref name="Cook">{{Cite web|url=http://www.johndcook.com/negative_binomial.pdf|title=Notes on the Negative Binomial Distribution|last=Cook|first=John D.}}</ref><ref>{{Cite web|url=http://www.stat.ufl.edu/~abhisheksaha/sta4321/lect14.pdf|title=Introduction to Probability / Fundamentals of Probability: Lecture 14|last=Saha|first=Abhishek}}</ref>
-|<math display="inline">\binom{k+r-1}{k} p^r(1-p)^k
+|<math display="inline">\binom{k+r-1}{r-1} p^r(1-p)^k </math><ref name="Wolfram"/><ref>[[SAS Institute]], "[https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0zb2l2xnsw2ctn1qe2os5yk2l9c.htm Negative Binomial Distribution]", ''SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition'', SAS Institute, Cary, NC, 2016.</ref><ref name="Crawley 2012">{{cite book|url=https://books.google.com/books?id=XYDl0mlH-moC|title=The R Book|last=Crawley|first=Michael J.|publisher=Wiley|year=2012|isbn=978-1-118-44896-0}}</ref><ref name=":0">{{Cite web|url=http://www.math.ntu.edu.tw/~hchen/teaching/StatInference/notes/lecture16.pdf|title=Set theory: Section 3.2.5 – Negative Binomial Distribution}}</ref>
-</math><ref>{{Cite web|url=http://www.mathworks.com/help/stats/negative-binomial-distribution.html|title=Mathworks: Negative Binomial Distribution}}</ref><ref name="Cook">{{Cite web|url=http://www.johndcook.com/negative_binomial.pdf|title=Notes on the Negative Binomial Distribution|last=Cook|first=John D.}}</ref><ref>{{Cite web|url=http://www.stat.ufl.edu/~abhisheksaha/sta4321/lect14.pdf|title=Introduction to Probability / Fundamentals of Probability: Lecture 14|last=Saha|first=Abhishek}}</ref>
+| rowspan="2" |<math display="inline">\binom{n-1}{k} p^r(1-p)^k </math>
-|<math display="inline">\binom{k+r-1}{r-1} p^r(1-p)^k
-</math><ref name="Wolfram"
-/>
-<ref>[[SAS Institute]], "[https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0zb2l2xnsw2ctn1qe2os5yk2l9c.htm Negative Binomial Distribution]", ''SAS(R) 9.4 Functions and CALL Routines: Reference, Fourth Edition'', SAS Institute, Cary, NC, 2016.</ref><ref name="Crawley 2012">{{cite book|url=https://books.google.com/books?id=XYDl0mlH-moC|title=The R Book|last=Crawley|first=Michael J.|publisher=Wiley|year=2012|isbn=978-1-118-44896-0}}</ref><ref name=":0">{{Cite web|url=http://www.math.ntu.edu.tw/~hchen/teaching/StatInference/notes/lecture16.pdf|title=Set theory: Section 3.2.5 – Negative Binomial Distribution}}</ref>
-| rowspan="2" |<math display="inline">\binom{n-1}{k} p^r(1-p)^k
-</math>
 |<math>\text{for }k = 0, 1, 2, \ldots</math>
 |-
 |2
 |{{mvar|n}} trials, given {{mvar|r}} successes
-|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
+|<math display="inline">f(n; r, p) \equiv \Pr(X = n) = </math>
-</math>
 |<math display="inline">\binom{n-1}{r-1} p^r(1-p)^{n-r}
 </math><ref name="Cook" /><ref name=":0" /><ref>{{Cite web|url=http://www.randomservices.org/random/bernoulli/NegativeBinomial.html|title=Randomservices.org, Chapter 10: Bernoulli Trials, Section 4: The Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://stattrek.com/probability-distributions/negative-binomial.aspx|title=Stat Trek: Negative Binomial Distribution}}</ref><ref>{{Cite web|url=http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf|title=Distinguishing Between Binomial, Hypergeometric and Negative Binomial Distributions|last=Wroughton|first=Jacqueline}}</ref>
-|<math display="inline">\binom{n-1}{n-r} p^r(1-p)^{n-r}
+|<math display="inline">\binom{n-1}{n-r} p^r(1-p)^{n-r} </math>
-</math>
 | rowspan="2" |<math>\text{for }n = r, r+1, r+2, \dotsc</math>
 |-
 |3
 |{{mvar|n}} trials, given {{mvar|r}} failures
-|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =
+|<math display="inline">f(n; r, p) \equiv \Pr(X = n) =</math>
-</math>
+|<math display="inline">\binom{n-1}{r-1} p^{n-r}(1-p)^{r}</math>
-|<math display="inline">\binom{n-1}{r-1} p^{n-r}(1-p)^{r}
+|<math display="inline">\binom{n-1}{n-r} p^{n-r}(1-p)^{r}</math>
-</math>
+| rowspan="2" |<math display="inline">\binom{n-1}{k} p^{k}(1-p)^r </math>
-|<math display="inline">\binom{n-1}{n-r} p^{n-r}(1-p)^{r}
-</math>
-| rowspan="2" |<math display="inline">\binom{n-1}{k} p^{k}(1-p)^{r}
-</math>
 |-
 |4
 |{{mvar|k}} successes, given {{mvar|r}} failures
-|<math display="inline">f(k; r, p) \equiv \Pr(X = k) =
+|<math display="inline">f(k; r, p) \equiv \Pr(X = k) = </math>
-</math>
+|<math display="inline">\binom{k+r-1}{k} p^k(1-p)^r </math>
-|<math display="inline">\binom{k+r-1}{k} p^k(1-p)^r
+|<math display="inline">\binom{k+r-1}{r-1} p^k(1-p)^r </math>
-</math>
-|<math display="inline">\binom{k+r-1}{r-1} p^k(1-p)^r
-</math>
 |<math>\text{for }k = 0, 1, 2, \ldots</math>
 |-
@@ Line 134: / Line 116: @@
 |<math>\text{for }k = 0, 1, 2, \dotsc, n</math>
 |}
+</div>
 Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: <math display="inline"> \binom ab = \binom a{a-b} \quad \text{for }\ 0\leq b\leq a</math>.  The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: <math display="inline">n=r+k
 </math>.  These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.
 * The definition where {{mvar|X}} is the number of {{mvar|n}} '''trials''' that occur for a given number of {{mvar|r}} '''successes''' is similar to the primary definition, except that the number of trials is given instead of the number of failures.  This adds {{mvar|r}} to the value of the random variable, shifting its support and mean.
 * The definition where {{mvar|X}} is the number of {{mvar|k}} '''successes''' (or {{mvar|n}} '''trials''') that occur for a given number of {{mvar|r}} '''failures''' is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given.  Note however, that {{mvar|p}} still refers to the probability of "success".
-* The definition of the negative binomial distribution can be extended to the case where the parameter {{mvar|r}} can take on a positive [[real number|real]] value.  Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function.  The problem of extending the definition to real-valued (positive) {{mvar|r}} boils down to extending the binomial coefficient to its real-valued counterpart, based on the [[gamma function]]:
+* The definition of the negative binomial distribution can be extended to the case where the parameter {{mvar|r}} can take on a positive [[real number|real]] value.  Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function.  The problem of extending the definition to real-valued (positive) {{mvar|r}} boils down to extending the binomial coefficient to its real-valued counterpart, based on the [[gamma function]]: <math display="block">
-:: <math>
     \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}
-  </math>
+</math> After substituting this expression in the original definition, we say that {{mvar|X}} has a negative binomial (or '''Pólya''') distribution if it has a [[probability mass function]]: <math display="block">
-: After substituting this expression in the original definition, we say that {{mvar|X}} has a negative binomial (or '''Pólya''') distribution if it has a [[probability mass function]]:
-:: <math>
      f(k; r, p) \equiv \Pr(X = k) = \frac{\Gamma(k+r)}{k!\,\Gamma(r)} (1-p)^k p^r \quad\text{for }k = 0, 1, 2, \dotsc
-  </math>
+</math> Here {{mvar|r}} is a real, positive number.
-: Here {{mvar|r}} is a real, positive number.
-In negative binomial regression,<ref name="neg bin reg2">{{cite book|url=https://books.google.com/books?id=0Q_ijxOEBjMC|title=Negative Binomial Regression|last=Hilbe|first=Joseph M.|authorlink=Joseph Hilbe|publisher=Cambridge University Press|year=2011|isbn=978-0-521-19815-8|edition=Second|location=Cambridge, UK}}</ref> the distribution is specified in terms of its mean, <math display="inline">m=\frac{r(1-p)}{p}</math>, which is then related to explanatory variables as in [[linear regression]] or other [[generalized linear model]]s.  From the expression for the mean {{mvar|m}}, one can derive <math display="inline">p=\frac{r}{m+r}</math> and <math display="inline">1-p=\frac{m}{m+r}</math>.  Then, substituting these expressions in [[#Extension to real-valued r|the one for the probability mass function when {{mvar|r}} is real-valued]], yields this parametrization of the probability mass function in terms of&nbsp;{{mvar|m}}:
+In negative binomial regression,<ref name="neg bin reg2">{{cite book|url=https://books.google.com/books?id=0Q_ijxOEBjMC|title=Negative Binomial Regression|last=Hilbe|first=Joseph M.|author-link=Joseph Hilbe|publisher=Cambridge University Press|year=2011|isbn=978-0-521-19815-8|edition=Second|location=Cambridge, UK}}</ref> the distribution is specified in terms of its mean, <math display="inline">m=\frac{r(1-p)}{p}</math>, which is then related to explanatory variables as in [[linear regression]] or other [[generalized linear model]]s.  From the expression for the mean {{mvar|m}}, one can derive <math display="inline">p=\frac{r}{m+r}</math> and <math display="inline">1-p=\frac{m}{m+r}</math>.  Then, substituting these expressions in [[#Extension to real-valued r|the one for the probability mass function when {{mvar|r}} is real-valued]], yields this parametrization of the probability mass function in terms of&nbsp;{{mvar|m}}:
-:<math>
+<math display="block">
-    \Pr(X = k) = \frac{\Gamma(r+k)}{k! \, \Gamma(r)} \left(\frac{r}{r+m}\right)^r \left(\frac{m}{r+m}\right)^k \quad\text{for }k = 0, 1, 2, \dotsc
+  \Pr(X = k) = \frac{\Gamma(r+k)}{k! \, \Gamma(r)} \left(\frac{r}{r+m}\right)^r \left(\frac{m}{r+m}\right)^k \quad\text{for }k = 0, 1, 2, \dotsc
-  </math>
+</math>
-The variance can then be written as <math display="inline">m+\frac{m^2}{r}</math>.  Some authors prefer to set <math display="inline">\alpha = \frac{1}{r}</math>, and express the variance as <math display="inline">m+\alpha m^2</math>.  In this context, and depending on the author, either the parameter {{mvar|r}} or its reciprocal {{mvar|α}} is referred to as the "dispersion parameter", "[[shape parameter]]" or "[[clustering coefficient]]",<ref>{{cite journal|last=Lloyd-Smith|first=J. O.|year=2007|title=Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases|journal=[[PLoS ONE]]|volume=2|issue=2|pages=e180|doi=10.1371/journal.pone.0000180|pmid=17299582|pmc=1791715|bibcode=2007PLoSO...2..180L|doi-access=free}} {{open access}}</ref> or the "heterogeneity"<ref name="neg bin reg2" /> or "aggregation" parameter.<ref name="Crawley 2012"/> The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter {{mvar|r}} towards zero corresponds to increasing aggregation of the organisms; increase of {{mvar|r}} towards infinity corresponds to absence of aggregation, as can be described by [[Poisson regression]].
+The variance can then be written as <math display="inline">m+\frac{m^2}{r}</math>.  Some authors prefer to set <math display="inline">\alpha = \frac{1}{r}</math>, and express the variance as <math display="inline">m+\alpha m^2</math>.  In this context, and depending on the author, either the parameter {{mvar|r}} or its reciprocal {{mvar|α}} is referred to as the "dispersion parameter", "[[shape parameter]]" or "[[clustering coefficient]]",<ref>{{cite journal|last=Lloyd-Smith|first=J. O.|year=2007|title=Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases|journal=[[PLoS ONE]]|volume=2|issue=2|article-number=e180|doi=10.1371/journal.pone.0000180|pmid=17299582|pmc=1791715|bibcode=2007PLoSO...2..180L|doi-access=free}} {{open access}}</ref> or the "heterogeneity"<ref name="neg bin reg2" /> or "aggregation" parameter.<ref name="Crawley 2012"/> The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter {{mvar|r}} towards zero corresponds to increasing aggregation of the organisms; increase of {{mvar|r}} towards infinity corresponds to absence of aggregation, as can be described by [[Poisson regression]].
 ===Alternative parameterizations===
 Sometimes the distribution is parameterized in terms of its mean {{mvar|μ}} and variance {{math|''σ''{{sup|2}}}}:
-:: <math>
+<math display="block">\begin{align}
-\begin{align}
 & p =\frac{\mu}{\sigma^2}, \\[6pt]
 & r =\frac{\mu^2}{\sigma^2-\mu}, \\[3pt]
@@ Line 164: / Line 142: @@
 & \operatorname{E}(X) = \mu \\
 & \operatorname{Var}(X) = \sigma^2 .
-\end{align}
+\end{align}</math>
-</math>
 Another popular parameterization uses {{mvar|r}} and the failure [[odds]] {{mvar|β}}:
-::<math>
+<math display="block">\begin{align}
-\begin{align}
 & p = \frac{1}{1+\beta} \\
 & \Pr(X=k) = {k+r-1 \choose k}  \left(\frac{\beta}{1+\beta}\right)^k \left(\frac {1} {1+\beta}\right)^r \\
 & \operatorname{E}(X) = r\beta \\
 & \operatorname{Var}(X) = r\beta(1+\beta) .
-\end{align}
+\end{align}</math>
-</math>
 ===Examples===
 ====Length of hospital stay====
-Hospital [[length of stay]] is an example of real-world data that can be modelled well with a negative binomial distribution via [[negative binomial regression]].<ref name="carter">{{cite journal |author=Carter, E.M., Potts, H.W.W. |date=4 April 2014 |title=Predicting length of stay from an electronic patient record system: a primary total knee replacement example |journal=BMC Medical Informatics and Decision Making |volume=14 |pages=26 |doi=10.1186/1472-6947-14-26 |pmc=3992140 |pmid=24708853 |doi-access=free }} {{open access}}</ref><ref>{{Cite journal |last1=Orooji |first1=Arezoo |last2=Nazar |first2=Eisa |last3=Sadeghi |first3=Masoumeh |last4=Moradi |first4=Ali |last5=Jafari |first5=Zahra |last6=Esmaily |first6=Habibollah |date=2021-04-30 |title=Factors associated with length of stay in hospital among the elderly patients using count regression models |url=http://mjiri.iums.ac.ir/article-1-6183-en.html |journal=Medical Journal of the Islamic Republic of Iran |volume=35 |page=5 |doi=10.47176/mjiri.35.5 |pmc=8111647 |pmid=33996656}}</ref>
+Hospital [[length of stay]] is an example of real-world data that can be modelled well with a negative binomial distribution via [[negative binomial regression]].<ref name="carter">{{cite journal |author=Carter, E.M., Potts, H.W.W. |date=4 April 2014 |title=Predicting length of stay from an electronic patient record system: a primary total knee replacement example |journal=BMC Medical Informatics and Decision Making |volume=14 |page=26 |doi = 10.1186/1472-6947-14-26 |pmc=3992140 |pmid=24708853 |doi-access=free }} {{open access}}</ref><ref>{{Cite journal |last1=Orooji |first1=Arezoo |last2=Nazar |first2=Eisa |last3=Sadeghi |first3=Masoumeh |last4=Moradi |first4=Ali |last5=Jafari |first5=Zahra |last6=Esmaily |first6=Habibollah |date=2021-04-30 |title=Factors associated with length of stay in hospital among the elderly patients using count regression models |url=http://mjiri.iums.ac.ir/article-1-6183-en.html |journal=Medical Journal of the Islamic Republic of Iran |volume=35 |page=5 |doi=10.47176/mjiri.35.5 |pmc=8111647 |pmid=33996656}}</ref>
 ====Selling candy====
@@ Line 189: / Line 164: @@
 Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so {{mvar|k}} in this case represents the number of failures and {{mvar|r}} represents the number of successes.  Recall that the {{math|NB(''r'', ''p'')}} distribution describes the probability of {{mvar|k}} failures and {{mvar|r}} successes in {{math|''k'' + ''r''}} {{math|Bernoulli(''p'')}} trials with success on the last trial.  Selling five candy bars means getting five successes.  The number of trials (i.e. houses) this takes is therefore {{math|1=''k'' + 5 = ''n''}}.  The random variable we are interested in is the number of houses, so we substitute {{math|1=''k'' = ''n'' − 5}} into a {{math|NB(5, 0.4)}} mass function and obtain the following mass function of the distribution of houses (for {{math|''n'' ≥ 5}}):
-:<math> f(n) = {(n-5) + 5 - 1 \choose n-5} \; (1-0.4)^5 \; 0.4^{n-5} = {n-1 \choose n-5} \; 3^5 \; \frac{2^{n-5}}{5^n}. </math>
+<math display="block"> f(n) = \binom{(n-5) + 5 - 1}{n-5} \; (1-0.4)^5 \; 0.4^{n-5} = {n-1 \choose n-5} \; 3^5 \; \frac{2^{n-5}}{5^n}. </math>
 ''What's the probability that Pat finishes on the tenth house?''
-:<math> f(10) = \frac{979776}{9765625} \approx 0.10033. \, </math>
+<math display="block"> f(10) = \frac{979776}{9765625} \approx 0.10033. \, </math>
 ''What's the probability that Pat finishes on or before reaching the eighth house?''
 To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities:
-:<math> f(5) = \frac{243}{3125} \approx 0.07776 \, </math>
+<math display="block"> \begin{align}
-:<math> f(6) = \frac{486}{3125} \approx 0.15552 \, </math>
+ f(5) &= \frac{243}{3125} \approx 0.07776 \\
-:<math> f(7) = \frac{2916}{15625} \approx 0.18662 \, </math>
+ f(6) &= \frac{486}{3125} \approx 0.15552 \\
-:<math> f(8) = \frac{13608}{78125} \approx 0.17418 \, </math>
+ f(7) &= \frac{2916}{15625} \approx 0.18662 \\
-:<math>\sum_{j=5}^8 f(j) = \frac{46413}{78125} \approx 0.59409.</math>
+ f(8) &= \frac{13608}{78125} \approx 0.17418
+\end{align}</math>
+<math display="block">\sum_{j=5}^8 f(j) = \frac{46413}{78125} \approx 0.59409.</math>
 ''What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?''
 This can be expressed as the probability that Pat [[Complementary event|does not]] finish on the fifth through the thirtieth house:
-:<math>1-\sum_{j=5}^{30} f(j) = 1 - I_{0.4}(5, 30-5+1) \approx 1 - 0.999999823 = 0.000000177. </math>
+<math display="block">1-\sum_{j=5}^{30} f(j) = 1 - I_{0.4}(5, 30-5+1) \approx 1 - 0.999999823 = 0.000000177. </math>
 Because of the rather high probability that Pat will sell to each house (60 percent), the probability of her ''not'' fulfilling her quest is vanishingly slim.
@@ Line 216: / Line 193: @@
 The expected total number of trials needed to see {{mvar|r}} successes is <math>\frac{r}{p}</math>. Thus, the expected number of ''failures'' would be this value, minus the successes:
-:<math>
+<math display="block">
-E[\operatorname{NB}(r, p)] = \frac{r}{p} - r = \frac{r(1-p)}{p}
+\operatorname{E}[\operatorname{NB}(r, p)] = \frac{r}{p} - r = \frac{r(1-p)}{p}
 </math>
@@ Line 224: / Line 201: @@
 The expected total number of failures in a negative binomial distribution with parameters {{math|(''r'', ''p'')}} is {{math|''r''(1 − ''p'')/''p''}}. To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until {{mvar|r}} successes are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: {{math|''a'', ''b'', ''c'', ...}} and set {{math|''a'' + ''b'' + ''c'' + ... {{=}} ''N''}}. Now we would expect about {{math|''Np''}} successes in total. Say the experiment was performed {{mvar|n}} times. Then there are {{math|''nr''}} successes in total. So we would expect {{math|''nr'' {{=}} ''Np''}}, so {{math|''N''/''n'' {{=}} ''r''/''p''}}. See that {{math|''N''/''n''}} is just the average number of trials per experiment. That is what we mean by "expectation". The average number of failures per experiment is {{math|1=''N''/''n'' − ''r'' = ''r''/''p'' − ''r'' = ''r''(1 − ''p'')/''p''}}. This agrees with the mean given in the box on the right-hand side of this page.
-A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let <math>X_r \sim\operatorname{NB}(r, p)</math> with the convention <math>X</math> represents the number of failures observed before <math>r</math> successes with the probability of success being <math>p</math>. And let <math>Y_i \sim Geom(p)</math> where <math>Y_i</math> represents the number of failures before seeing a success. We can think of <math>Y_i</math> as the waiting time (number of failures) between the <math>i</math>th and <math>(i-1)</math>th success. Thus
+A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let <math>X_r \sim \operatorname{NB}(r, p)</math> with the convention <math>X</math> represents the number of failures observed before <math>r</math> successes with the probability of success being <math>p</math>. And let <math>Y_i \sim \mathrm{Geom}(p)</math> where <math>Y_i</math> represents the number of failures before seeing a success. We can think of <math>Y_i</math> as the waiting time (number of failures) between the <math>i</math>th and <math>(i-1)</math>th success. Thus
-:<math>
+<math display="block">
 X_r = Y_1 + Y_2 + \cdots + Y_r.
 </math>
 The mean is
-:<math>
+<math display="block">
-E[X_r] = E[Y_1] + E[Y_2] + \cdots + E[Y_r] = \frac{r(1-p)}{p},
+\operatorname{E}[X_r] = \operatorname{E}[Y_1] + \operatorname{E}[Y_2] + \cdots + \operatorname{E}[Y_r] = \frac{r(1-p)}{p},
 </math>
-which follows from the fact <math>E[Y_i] = (1-p)/p</math>.
+which follows from the fact <math>\operatorname{E}[Y_i] = (1-p)/p</math>.
 === Variance ===
@@ Line 242: / Line 219: @@
 Suppose {{mvar|Y}} is a random variable with a [[binomial distribution]] with parameters {{mvar|n}} and {{mvar|p}}.  Assume {{math|1=''p'' + ''q'' = 1}}, with {{math|''p'', ''q'' ≥ 0}}, then
-:<math>1=1^n=(p+q)^n.</math>
+<math display="block">1 = 1^n = (p+q)^n.</math>
 Using [[Newton's binomial theorem]], this can equally be written as:
-:<math>(p+q)^n=\sum_{k=0}^\infty \binom{n}{k} p^k q^{n-k},</math>
+<math display="block">(p+q)^n=\sum_{k=0}^\infty \binom{n}{k} p^k q^{n-k},</math>
 in which the upper bound of summation is infinite.  In this case, the [[binomial coefficient]]
-: <math>\binom{n}{k} = {n(n-1)(n-2)\cdots(n-k+1) \over k! }.</math>
+<math display="block">\binom{n}{k} = {n(n-1)(n-2)\cdots(n-k+1) \over k! }.</math>
-is defined when {{mvar|n}} is a real number, instead of just a positive integer.  But in our case of the binomial distribution it is zero when {{math|''k'' > ''n''}}.  We can then say, for example
+is defined when {{mvar|n}} is a real number, instead of just a positive [[integer]].  But in our case of the binomial distribution it is zero when {{math|''k'' > ''n''}}.  We can then say, for example
-: <math>(p+q)^{8.3}=\sum_{k=0}^\infty \binom{8.3}{k} p^k q^{8.3 - k}.</math>
+<math display="block">(p+q)^{8.3} = \sum_{k=0}^\infty \binom{8.3}{k} p^k q^{8.3 - k}.</math>
 Now suppose {{math|''r'' > 0}} and we use a negative exponent:
-:<math>1=p^r\cdot p^{-r}=p^r (1-q)^{-r}=p^r \sum_{k=0}^\infty \binom{-r}{k} (-q)^k.</math>
+<math display="block">1=p^r\cdot p^{-r}=p^r (1-q)^{-r}=p^r \sum_{k=0}^\infty \binom{-r}{k} (-q)^k.</math>
 Then all of the terms are positive, and the term
-:<math>p^r \binom{-r}{k} (-q)^k = \binom{k + r - 1}{k} p^rq^k</math>
+<math display="block">p^r \binom{-r}{k} (-q)^k = \binom{k + r - 1}{k} p^rq^k</math>
 is just the probability that the number of failures before the {{mvar|r}}-th success is equal to {{mvar|k}}, provided {{mvar|r}} is an integer.  (If {{mvar|r}} is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.)
@@ Line 279: / Line 256: @@
 For the probability mass function
-: <math> \begin{cases}
+<math display="block"> \begin{cases}
-(k+1) \Pr (X=k+1)-p \Pr (X=k) (k+r)=0, \\[5pt]
+(k+1) \Pr (X=k+1)-(1-p) \Pr (X=k) (k+r)=0, \\[5pt]
 \Pr (X=0)=(1-p)^r.
 \end{cases}
@@ Line 286: / Line 263: @@
 For the moments <math>m_k = \mathbb E(X^k),</math>
-: <math> m_{k+1} = r P m_k + (P^2 + P) {d m_k \over dP}, \quad P:=(1-p)/p, \quad m_0=1.
+<math display="block"> m_{k+1} = r P m_k + (P^2 + P) {d m_k \over dP}, \quad P:=(1-p)/p, \quad m_0=1.
 </math>
 For the cumulants
-: <math> \kappa_{k+1} = (Q-1)Q {d \kappa_k \over dQ}, \quad Q:=1/p, \quad \kappa_1=r(Q-1).
+<math display="block"> \kappa_{k+1} = (Q-1)Q {d \kappa_k \over dQ}, \quad Q:=1/p, \quad \kappa_1=r(Q-1).
 </math>
 ==Related distributions==
-* The [[geometric distribution]] on {{math|{{mset|0, 1, 2, 3, ... }}}} is a special case of the negative binomial distribution, with
+* The [[geometric distribution]] on {{math|{{mset|0, 1, 2, 3, ... }}}} is a special case of the negative binomial distribution, with <math display="block">\operatorname{Geom}(p) = \operatorname{NB}(1,\, p).\,</math>
-::<math>\operatorname{Geom}(p) = \operatorname{NB}(1,\, p).\,</math>
 * The negative binomial distribution is a special case of the [[discrete phase-type distribution]].
 * The negative binomial distribution is a special case of discrete [[compound Poisson distribution]].
@@ Line 302: / Line 277: @@
 ===Poisson distribution===
 Consider a sequence of negative binomial random variables where the stopping parameter {{mvar|r}} goes to infinity, while the probability {{mvar|p}} of success in each trial goes to one, in such a way as to keep the mean of the distribution (i.e. the expected number of failures) constant. Denoting this mean as {{mvar|λ}}, the parameter {{mvar|p}} will be {{math|1=''p'' = ''r''/(''r'' + ''λ'')}}
-: <math>
+<math display="block"> \begin{align}
-\begin{align}
      \text{Mean:} \quad & \lambda = \frac{(1-p)r}{p} \quad \Rightarrow \quad p = \frac{r}{r+\lambda}, \\
      \text{Variance:} \quad & \lambda \left( 1 + \frac{\lambda}{r} \right) > \lambda, \quad \text{thus always overdispersed}.
-\end{align}
+\end{align} </math>
-  </math>
 Under this parametrization the probability mass function will be
-: <math>
+<math display="block">
      f(k; r, p) = \frac{\Gamma(k+r)}{k!\cdot\Gamma(r)}(1-p)^k p^r = \frac{\lambda^k}{k!} \cdot \frac{\Gamma(r+k)}{\Gamma(r)\;(r+\lambda)^k} \cdot \frac{1}{\left(1+\frac{\lambda}{r}\right)^r}
-  </math>
+</math>
 Now if we consider the limit as {{math|''r'' → ∞}}, the second factor will converge to one, and the third to the exponent function:
-: <math>
+<math display="block">
      \lim_{r\to\infty} f(k; r, p) = \frac{\lambda^k}{k!} \cdot 1 \cdot \frac{1}{e^\lambda},
-  </math>
+</math>
 which is the mass function of a [[Poisson distribution|Poisson-distributed]] random variable with expected value&nbsp;{{mvar|λ}}.
 In other words, the alternatively parameterized negative binomial distribution [[convergence in distribution|converges]] to the Poisson distribution and {{mvar|r}} controls the deviation from the Poisson.  This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large {{mvar|r}}, but which has larger variance than the Poisson for small {{mvar|r}}.
-: <math>
+<math display="block">
      \operatorname{Poisson}(\lambda) = \lim_{r \to \infty} \operatorname{NB} \left(r, \frac{r}{r + \lambda}\right).
-  </math>
+</math>
 ===Gamma–Poisson mixture===
 The negative binomial distribution also arises as a continuous mixture of [[Poisson distribution]]s (i.e. a [[compound probability distribution]]) where the mixing distribution of the Poisson rate is a [[gamma distribution]]. That is, we can view the negative binomial as a {{math|Poisson(''λ'')}} distribution, where {{mvar|λ}} is itself a random variable, distributed as a gamma distribution with shape {{mvar|r}} and scale {{math|1=''θ'' = (1 − ''p'')/''p''}} or correspondingly rate {{math|1=''β'' = ''p''/(1 − ''p'')}}.
-To display the intuition behind this statement, consider two independent Poisson processes, "Success" and "Failure", with intensities {{mvar|p}} and {{math|1 − ''p''}}. Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability {{mvar|p}}; otherwise, it is a failure. If {{mvar|r}} is a counting number, the coin tosses show that the count of successes before the {{mvar|r}}-th failure follows a negative binomial distribution with parameters {{mvar|r}} and {{mvar|p}}. The count is also, however, the count of the Success Poisson process at the random time {{mvar|T}} of the {{mvar|r}}-th occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean {{math|''pT''}}, where {{mvar|T}} is the waiting time for {{mvar|r}} occurrences in a Poisson process of intensity {{math|1 − ''p''}}, i.e., {{mvar|T}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|1 − ''p''}}. Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean {{math|''pT''}}, where the random variate {{mvar|T}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|(1 − ''p'')}}. The preceding paragraph follows, because {{math|1=''λ'' = ''pT''}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|(1 − ''p'')/''p''}}.
+To display the intuition behind this statement, consider two independent Poisson processes, "Success" and "Failure", with intensities {{mvar|p}} and {{math|1 − ''p''}}. Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability {{mvar|p}}; otherwise, it is a failure. If {{mvar|r}} is a counting number, the coin tosses show that the count of successes before the {{mvar|r}}-th failure follows a negative binomial distribution with parameters {{mvar|r}} and {{math|1 − ''p''}}. The count is also, however, the count of the Success Poisson process at the random time {{mvar|T}} of the {{mvar|r}}-th occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean {{math|''pT''}}, where {{mvar|T}} is the waiting time for {{mvar|r}} occurrences in a Poisson process of intensity {{math|1 − ''p''}}, i.e., {{mvar|T}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|1 − ''p''}}. Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean {{math|''pT''}}, where the random variate {{mvar|T}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|(1 − ''p'')}}. The preceding paragraph follows, because {{math|1=''λ'' = ''pT''}} is gamma-distributed with shape parameter {{mvar|r}} and intensity {{math|(1 − ''p'')/''p''}}.
 The following formal derivation (which does not depend on {{mvar|r}} being a counting number) confirms the intuition.
-: <math>\begin{align}
+<math display="block">\begin{align}
 & \int_0^\infty f_{\operatorname{Poisson}(\lambda)}(k) \times f_{\operatorname{Gamma}\left(r,\, \frac{p}{1-p}\right)}(\lambda) \, \mathrm{d}\lambda \\[8pt]
 = {} & \int_0^\infty \frac{\lambda^k}{k!} e^{-\lambda} \times \frac 1 {\Gamma(r)} \left(\frac{p}{1-p} \lambda \right)^{r-1} e^{- \frac{p}{1-p} \lambda} \, \left( \frac p{1-p} \, \right)\mathrm{d}\lambda \\[8pt]
@@ Line 349: / Line 322: @@
 Furthermore, if {{math|''B''{{sub|''s''+''r''}}}} is a random variable following the [[binomial distribution]] with parameters {{math|''s'' + ''r''}} and {{mvar|p}}, then
-: <math>
+<math display="block">
 \begin{align}
 \Pr(Y_r \leq s) & {} = 1 - I_p(s+1, r) \\[5pt]
@@ Line 368: / Line 341: @@
 The negative binomial distribution {{math|NB(''r'', ''p'')}} can be represented as a [[compound Poisson distribution]]: Let <math display=inline> (Y_n)_{n\,\in\,\mathbb N} </math> denote a sequence of [[independent and identically distributed random variables]], each one having the [[logarithmic distribution|logarithmic series distribution]] {{math|Log(''p'')}}, with probability mass function
-: <math> f(k; r, p) =  \frac{-p^k}{k\ln(1-p)},\qquad k\in{\mathbb N}.</math>
+<math display="block"> f(k; r, p) =  \frac{-p^k}{k\ln(1-p)},\qquad k\in{\mathbb N}.</math>
 Let {{mvar|N}} be a random variable, [[independence (probability theory)|independent]] of the sequence, and suppose that {{mvar|N}} has a [[Poisson distribution]] with mean {{math|λ {{=}} −''r'' ln(1 − ''p'')}}. Then the random sum
-: <math>X=\sum_{n=1}^N Y_n</math>
+<math display="block">X=\sum_{n=1}^N Y_n</math>
 is {{math|NB(''r'', ''p'')}}-distributed. To prove this, we calculate the [[probability generating function]] {{math|''G''{{sub|''X''}}}} of {{mvar|X}}, which is the composition of the probability generating functions {{math|''G''{{sub|''N''}}}} and {{math|''G''{{sub|''Y''{{sub|1}}}}}}. Using
-:<math>G_N(z)=\exp(\lambda(z-1)),\qquad z\in\mathbb{R},</math>
+<math display="block">G_N(z)=\exp(\lambda(z-1)),\qquad z\in\mathbb{R},</math>
 and
-: <math>G_{Y_1}(z)=\frac{\ln(1-pz)}{\ln(1-p)},\qquad |z|<\frac1p,</math>
+<math display="block">G_{Y_1}(z)=\frac{\ln(1-pz)}{\ln(1-p)},\qquad |z|<\frac1p,</math>
 we obtain
-: <math>
+<math display="block">
-\begin{align}G_X(z) & =G_N(G_{Y_1}(z))\\[4pt]
+\begin{align}
-&=\exp\biggl(\lambda\biggl(\frac{\ln(1-pz)}{\ln(1-p)}-1\biggr)\biggr)\\[4pt]
+G_X(z) & = G_N(G_{Y_1}(z))\\[4pt]
-&=\exp\bigl(-r(\ln(1-pz)-\ln(1-p))\bigr)\\[4pt]
+&=\exp\left[\lambda\left(\frac{\ln(1-pz)}{\ln(1-p)} - 1\right)\right] \\[1ex]
-&=\biggl(\frac{1-p}{1-pz}\biggr)^r,\qquad |z|<\frac1p,
+&=\exp\left[-r\left(\ln(1-pz)-\ln(1-p)\right)\right] \\[1ex]
+&=\left(\frac{1-p}{1-pz}\right)^r,\qquad |z|<\frac{1}{p},
 \end{align}
 </math>
@@ Line 418: / Line 392: @@
 In estimating {{mvar|p}}, the [[minimum variance unbiased estimator]] is
-: <math>\widehat{p}=\frac{r-1}{r+k-1}.</math>
+<math display="block">\widehat{p} = \frac{r-1}{r+k-1}.</math>
 ====Maximum likelihood estimation====
@@ Line 424: / Line 398: @@
 When {{mvar|r}} is known, the [[maximum likelihood]] estimate of {{mvar|p}} is
-: <math>\widetilde{p}=\frac{r}{r+k},</math>
+<math display="block">\widetilde{p} = \frac{r}{r+k},</math>
 but this is a [[bias of an estimator|biased estimate]]. Its inverse {{math|(''r'' + ''k'')/''r''}}, is an unbiased estimate of {{math|1/''p''}}, however.<ref>{{cite journal |first=J. B. S. |last=Haldane |author-link=J. B. S. Haldane |title=On a Method of Estimating Frequencies |journal=[[Biometrika]] |volume=33 |issue=3 |year=1945 |pages=222–225 |jstor=2332299 |doi=10.1093/biomet/33.3.222|pmid=21006837 |hdl=10338.dmlcz/102575 |hdl-access=free }}</ref>
-When {{mvar|r}} is unknown, the maximum likelihood estimator for {{mvar|p}} and {{mvar|r}} together only exists for samples for which the sample variance is larger than the sample mean.<ref name="aramidis1999">{{cite journal|last=Aramidis |first=K. |year=1999 |title=An EM algorithm for estimating negative binomial parameters |journal=[[Australian & New Zealand Journal of Statistics]] |volume=41 |issue=2 |pages=213–221 |doi=10.1111/1467-842X.00075 |s2cid=118758171 |doi-access=free }}</ref> The [[likelihood function]] for {{mvar|N}} [[independent and identically-distributed random variables|iid]] observations {{math|(''k''{{sub|1}}, ..., ''k''{{sub|''N''}})}} is
+When {{mvar|r}} is unknown, the maximum likelihood estimator for {{mvar|p}} and {{mvar|r}} together only exists for samples for which the sample variance is larger than the sample mean.<ref name="aramidis1999">{{cite journal|last = Aramidis | first = K. |year=1999 |title=An EM algorithm for estimating negative binomial parameters |journal = [[Australian & New Zealand Journal of Statistics]] |volume=41 |issue=2 |pages=213–221 |doi = 10.1111/1467-842X.00075 |s2cid=118758171 |doi-access=free }}</ref> The [[likelihood function]] for {{mvar|N}} [[independent and identically-distributed random variables|iid]] observations {{math|(''k''{{sub|1}}, ..., ''k''{{sub|''N''}})}} is
-:<math>L(r,p)=\prod_{i=1}^N f(k_i;r,p)\,\!</math>
+<math display="block">L(r,p)=\prod_{i=1}^N f(k_i;r,p)\,\!</math>
 from which we calculate the log-likelihood function
-:<math>\ell(r,p) = \sum_{i=1}^N \ln(\Gamma(k_i + r)) - \sum_{i=1}^N \ln(k_i !) - N\ln(\Gamma(r)) + \sum_{i=1}^N k_i \ln(1-p) + Nr \ln(p).</math>
+<math display="block">\ell(r,p) = \sum_{i=1}^N \left[\ln\Gamma(k_i + r) - \ln(k_i!) + k_i \ln(1-p)\right] + N \left[r \ln p - \ln\Gamma(r)\right].</math>
 To find the maximum we take the partial derivatives with respect to {{mvar|r}} and {{mvar|p}} and set them equal to zero:
-:<math>\frac{\partial \ell(r,p)}{\partial p} = -\left[\sum_{i=1}^N k_i \frac{1}{1-p}\right] + Nr \frac{1}{p} = 0</math> and
+<math display="block">\frac{\partial \ell(r,p)}{\partial p} = -\left[\sum_{i=1}^N k_i \frac{1}{1-p}\right] + Nr \frac{1}{p} = 0</math> and
-:<math>\frac{\partial \ell(r,p)}{\partial r} = \left[\sum_{i=1}^N \psi(k_i + r)\right] - N\psi(r) + N\ln(p) = 0</math>
+<math display="block">\frac{\partial \ell(r,p)}{\partial r} = \left[\sum_{i=1}^N \psi(k_i + r)\right] - N\psi(r) + N \ln(p) = 0</math>
 where
-: <math>\psi(k) = \frac{\Gamma'(k)}{\Gamma(k)} \!</math> is the [[digamma function]].
+<math display="block">\psi(k) = \frac{\Gamma'(k)}{\Gamma(k)} \!</math> is the [[digamma function]].
 Solving the first equation for {{mvar|p}} gives:
-:<math>p = \frac{Nr} {Nr + \sum_{i=1}^N k_i}</math>
+<math display="block">p = \frac{Nr} {Nr + \sum_{i=1}^N k_i}</math>
 Substituting this in the second equation gives:
-:<math>\frac{\partial \ell(r,p)}{\partial r} = \left[\sum_{i=1}^N \psi(k_i + r)\right] - N\psi(r) + N\ln\left(\frac{r}{r + \sum_{i=1}^N k_i/N}\right) = 0</math>
+<math display="block">\frac{\partial \ell(r,p)}{\partial r} = \left[\sum_{i=1}^N \psi(k_i + r)\right] - N\psi(r) + N\ln\left(\frac{r}{r + \sum_{i=1}^N k_i/N}\right) = 0</math>
 This equation cannot be solved for {{mvar|r}} in [[Closed-form expression|closed form]]. If a numerical solution is desired, an iterative technique such as [[Newton's method]] can be used. Alternatively, the [[expectation–maximization algorithm]] can be used.<ref name="aramidis1999" />
@@ Line 465: / Line 439: @@
 When {{math|1=''r'' = 1}} we get the probability distribution of number of successes before the first failure (i.e. the probability of the first failure occurring on the {{math|(''k'' + 1)}}-st trial), which is a [[geometric distribution]]:
-: <math>
+<math display="block"> f(k; r, p) = (1-p) \cdot p^k </math>
-    f(k; r, p) = (1-p) \cdot p^k \!
-  </math>
 ===Overdispersed Poisson===
@@ Line 475: / Line 447: @@
 An application of this is to annual counts of [[tropical cyclone]]s in the [[Atlantic Ocean|North Atlantic]] or to monthly to 6-monthly counts of wintertime [[extratropical cyclone]]s over Europe, for which the variance is greater than the mean.<ref>{{cite journal|last=Villarini |first=G. |author2=Vecchi, G.A. |author3=Smith, J.A.|year=2010 |title=Modeling of the dependence of tropical storm counts in the North Atlantic Basin on climate indices |journal=[[Monthly Weather Review]] |volume=138 |issue=7 |pages=2681–2705 |doi=10.1175/2010MWR3315.1  |bibcode=2010MWRv..138.2681V |doi-access=free }}</ref><ref>{{cite journal|last=Mailier |first=P.J. |author2=Stephenson, D.B. |author3=Ferro, C.A.T. |author4= Hodges, K.I. |year=2006 |title=Serial Clustering of Extratropical Cyclones |journal=[[Monthly Weather Review]] |volume=134 |issue=8 |pages=2224–2240 |doi=10.1175/MWR3160.1 |bibcode=2006MWRv..134.2224M |doi-access=free }}</ref><ref>{{cite journal|last=Vitolo |first=R. |author2=Stephenson, D.B. |author3=Cook, Ian M. |author4= Mitchell-Wallace, K. |year=2009 |title=Serial clustering of intense European storms |journal=[[Meteorologische Zeitschrift]] |volume=18 |issue=4 |pages=411–424 |doi=10.1127/0941-2948/2009/0393 |bibcode=2009MetZe..18..411V |s2cid=67845213 }}</ref>  In the case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson distribution.<ref>{{cite book  | last = McCullagh | first = Peter | author-link= Peter McCullagh |author2=Nelder, John |author-link2=John Nelder  | title = Generalized Linear Models |edition=Second | publisher = Boca Raton: Chapman and Hall/CRC | year = 1989 | isbn = 978-0-412-31760-6 |ref=McCullagh1989}}</ref><ref>{{cite book | last = Cameron | first = Adrian C. | author2 = Trivedi, Pravin K. | title = Regression analysis of count data | publisher = Cambridge University Press | year = 1998 | isbn = 978-0-521-63567-7 | ref = Cameron1998 | url-access = registration | url = https://archive.org/details/regressionanalys00came }}</ref>
-Negative binomial modeling is widely employed in ecology and biodiversity research for analyzing count data where overdispersion is very common. This is because overdispersion is indicative of biological aggregation, such as species or communities forming clusters. Ignoring overdispersion can lead to significantly inflated model parameters, resulting in misleading statistical inferences. The negative binomial distribution effectively addresses overdispersed counts by permitting the variance to vary quadratically with the mean. An additional dispersion parameter governs the slope of the quadratic term, determining the severity of overdispersion. The model's quadratic mean-variance relationship proves to be a realistic approach for handling overdispersion, as supported by empirical evidence from many studies. Overall, the NB model offers two attractive features: (1) the convenient interpretation of the dispersion parameter as an index of clustering or aggregation, and (2) its tractable form, featuring a closed expression for the probability mass function.<ref>
+Negative binomial modeling is widely employed in ecology and biodiversity research for analyzing [[count data]] where overdispersion is very common. This is because overdispersion is indicative of biological aggregation, such as species or communities forming clusters. Ignoring overdispersion can lead to significantly inflated model parameters, resulting in misleading statistical inferences. The negative binomial distribution effectively addresses overdispersed counts by permitting the variance to vary quadratically with the mean. An additional dispersion parameter governs the slope of the quadratic term, determining the severity of overdispersion. The model's quadratic mean-variance relationship proves to be a realistic approach for handling overdispersion, as supported by empirical evidence from many studies. Overall, the NB model offers two attractive features: (1) the convenient interpretation of the dispersion parameter as an index of clustering or aggregation, and (2) its tractable form, featuring a closed expression for the probability mass function.<ref>
-{{cite journal|last=Stoklosa |first=J. |author2=Blakey, R.V. |author3=Hui, F.K.C. |year=2022 |title=An Overview of Modern Applications of Negative Binomial Modelling in Ecology and Biodiversity |journal=[[Diversity (journal)|Diversity]] |volume=14 |issue=5 |pages=320 |doi=10.3390/d14050320 |doi-access=free |bibcode=2022Diver..14..320S }}
+{{cite journal|last=Stoklosa |first=J. |author2=Blakey, R.V. |author3=Hui, F.K.C. |year=2022 |title=An Overview of Modern Applications of Negative Binomial Modelling in Ecology and Biodiversity |journal=[[Diversity (journal)|Diversity]] |volume=14 |issue=5 |page=320 |doi=10.3390/d14050320 |doi-access=free |bibcode=2022Diver..14..320S }}
 </ref>
-In genetics, the negative binomial distribution is commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments.<ref>
+In genetics, the negative binomial distribution is commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments.<ref>{{cite journal|last=Robinson |first=M.D. |last2=Smyth | first2 = G.K. |year=2007 |title=Moderated statistical tests for assessing differences in tag abundance. |journal=[[Bioinformatics]] |volume=23 |issue=21 |pages=2881–2887 |doi=10.1093/bioinformatics/btm453 |pmid=17881408|doi-access=free }}</ref><ref>{{cite web |url=http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf |title=Differential analysis of count data – the}}</ref><ref>{{cite conference|last=Airoldi |first=E. M. |author2=Cohen, W. W. |author3=Fienberg, S. E. |date=June 2005 |title=Bayesian Models for Frequent Terms in Text |book-title=Proceedings of the Classification Society of North America and INTERFACE Annual Meetings |volume=990 |page=991 |location=St. Louis, MO, USA }}</ref><ref>{{cite web |url=http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf |title=edgeR: differential expression analysis of digital gene expression data |last1=Chen |first1=Yunshun |last2=Davis |first2=McCarthy |date=September 25, 2014 |access-date=October 14, 2014}}</ref>
-{{cite journal|last=Robinson |first=M.D. |author2=Smyth, G.K. |year=2007 |title=Moderated statistical tests for assessing differences in tag abundance. |journal=[[Bioinformatics]] |volume=23 |issue=21 |pages=2881–2887 |doi=10.1093/bioinformatics/btm453 |pmid=17881408|doi-access=free }}
-</ref><ref>
-{{cite web |url=http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf |title=Differential analysis of count data – the}}</ref><ref>
-{{cite conference|last=Airoldi |first=E. M. |author2=Cohen, W. W. |author3=Fienberg, S. E. |date=June 2005 |title=Bayesian Models for Frequent Terms in Text |book-title=Proceedings of the Classification Society of North America and INTERFACE Annual Meetings |volume=990 |pages=991 |location=St. Louis, MO, USA }}
-</ref><ref>
-{{cite web |url=http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf |title=edgeR: differential expression analysis of digital gene expression data |last1=Chen |first1=Yunshun |last2=Davis |first2=McCarthy |date=September 25, 2014 |access-date=October 14, 2014}}</ref>
 In epidemiology of infectious diseases, the negative binomial has been used as a better option than the Poisson distribution to model overdispersed counts of secondary infections from one infected case (super-spreading events).<ref>{{cite journal|last=Lloyd-Smith|first=J. O. |author2= Schreiber, S. J. |author3= Kopp, P. E. |author4= Getz, W. M. |year=2005 |title=Superspreading and the effect of individual variation on disease emergence |journal=[[Nature (journal)|Nature]] |volume=438 |issue=7066 |pages=355–359 |doi=10.1038/nature04153|pmid=16292310 |pmc=7094981 |bibcode=2005Natur.438..355L }}</ref>
@@ Line 491: / Line 457: @@
 ===Multiplicity observations (physics)===
-The negative binomial distribution has been the most effective statistical model for a broad range of multiplicity observations in [[particle collision]] experiments, e.g., <math>p\bar p,\ hh,\ hA,\ AA,\ e^{+}e^-</math> <ref>{{Cite journal |last1=Grosse-Oetringhaus |first1=Jan Fiete |last2=Reygers |first2=Klaus |date=2010-08-01 |title=Charged-particle multiplicity in proton–proton collisions |url=https://iopscience.iop.org/article/10.1088/0954-3899/37/8/083001 |journal=Journal of Physics G: Nuclear and Particle Physics |volume=37 |issue=8 |pages=083001 |doi=10.1088/0954-3899/37/8/083001 |issn=0954-3899|arxiv=0912.0023 |s2cid=119233810 }}</ref><ref>{{Cite journal |last1=Rybczyński |first1=Maciej |last2=Wilk |first2=Grzegorz |last3=Włodarczyk |first3=Zbigniew |date=2019-05-31 |title=Intriguing properties of multiplicity distributions |journal=Physical Review D |language=en |volume=99 |issue=9 |page=094045 |doi=10.1103/PhysRevD.99.094045 |arxiv=1811.07197 |bibcode=2019PhRvD..99i4045R |issn=2470-0010|doi-access=free }}</ref><ref>{{Cite journal |last1=Tarnowsky |first1=Terence J. |last2=Westfall |first2=Gary D. |date=2013-07-09 |title=First study of the negative binomial distribution applied to higher moments of net-charge and net-proton multiplicity distributions |journal=Physics Letters B |volume=724 |issue=1 |pages=51–55 |doi=10.1016/j.physletb.2013.05.064 |arxiv=1210.8102 |bibcode=2013PhLB..724...51T |issn=0370-2693|doi-access=free }}</ref><ref>{{Cite journal |last1=Derrick |first1=M. |last2=Gan |first2=K. K. |last3=Kooijman |first3=P. |last4=Loos |first4=J. S. |last5=Musgrave |first5=B. |last6=Price |first6=L. E. |last7=Repond |first7=J. |last8=Schlereth |first8=J. |last9=Sugano |first9=K. |last10=Weiss |first10=J. M. |last11=Wood |first11=D. E. |last12=Baranko |first12=G. |last13=Blockus |first13=D. |last14=Brabson |first14=B. |last15=Brom |first15=J. M. |date=1986-12-01 |title=<nowiki>Study of quark fragmentation in ${e}^{+}$${e}^{\mathrm{\ensuremath{-}}}$ annihilation at 29 GeV: Charged-particle multiplicity and single-particle rapidity distributions</nowiki> |url=https://link.aps.org/doi/10.1103/PhysRevD.34.3304 |journal=Physical Review D |volume=34 |issue=11 |pages=3304–3320 |doi=10.1103/PhysRevD.34.3304|pmid=9957066 |hdl=1808/15222 |hdl-access=free }}</ref><ref>{{Cite journal |last=Zborovský |first=I. |date=2018-10-10 |title=Three-component multiplicity distribution, oscillation of combinants and properties of clans in pp collisions at the LHC |journal=The European Physical Journal C |language=en |volume=78 |issue=10 |pages=816 |doi=10.1140/epjc/s10052-018-6287-x |arxiv=1811.11230 |bibcode=2018EPJC...78..816Z |issn=1434-6052|doi-access=free }}</ref> (See <ref>{{Cite book |last1=Kittel |first1=Wolfram |title=Soft multihardon dynamics |last2=De Wolf |first2=Eddi A |publisher=World Scientific |year=2005}}</ref> for an overview), and is argued to be a [[scale-invariant]] property of matter,<ref>{{Cite journal |last=Schaeffer |first=R |date=1984 |title=Determination of the galaxy N-point correlation function |journal=Astronomy and Astrophysics |volume=134 |issue=2 |pages=L15|bibcode=1984A&A...134L..15S }}</ref><ref>{{Cite journal |last=Schaeffer |first=R |date=1985 |title=The probability generating function for galaxy clustering |journal=Astronomy and Astrophysics |volume=144 |issue=1 |pages=L1–L4|bibcode=1985A&A...144L...1S }}</ref> providing the best fit for astronomical observations, where it predicts the number of galaxies in a region of space.<ref>{{Cite journal |last1=Perez |first1=Lucia A. |last2=Malhotra |first2=Sangeeta |last3=Rhoads |first3=James E. |last4=Tilvi |first4=Vithal |date=2021-01-07 |title=Void Probability Function of Simulated Surveys of High-redshift Ly α Emitters |journal=The Astrophysical Journal |volume=906 |issue=1 |pages=58 |doi=10.3847/1538-4357/abc88b |arxiv=2011.03556 |bibcode=2021ApJ...906...58P |issn=1538-4357 |doi-access=free }}</ref><ref>{{Cite journal |last1=Hurtado-Gil |first1=Lluís |last2=Martínez |first2=Vicent J. |last3=Arnalte-Mur |first3=Pablo |last4=Pons-Bordería |first4=María-Jesús |last5=Pareja-Flores |first5=Cristóbal |last6=Paredes |first6=Silvestre |date=2017-05-01 |title=The best fit for the observed galaxy counts-in-cell distribution function |url=https://www.aanda.org/articles/aa/abs/2017/05/aa29097-16/aa29097-16.html |journal=Astronomy & Astrophysics |language=en |volume=601 |pages=A40 |doi=10.1051/0004-6361/201629097 |arxiv=1703.01087 |bibcode=2017A&A...601A..40H |issn=0004-6361|doi-access=free }}</ref><ref>{{Cite journal |last1=Elizalde |first1=E. |last2=Gaztanaga |first2=E. |date=January 1992 |title=Void probability as a function of the void's shape and scale-invariant models |journal=Monthly Notices of the Royal Astronomical Society |volume=254 |issue=2 |pages=247–256 |doi=10.1093/mnras/254.2.247 |issn=0035-8711|doi-access=free |hdl=2060/19910019799 |hdl-access=free }}</ref><ref>{{Cite journal |last1=Hameeda |first1=M |last2=Plastino |first2=Angelo |last3=Rocca |first3=M C |date=2021-03-01 |title=Generalized Poisson distributions for systems with two-particle interactions |journal=IOP SciNotes |volume=2 |issue=1 |pages=015003 |doi=10.1088/2633-1357/abec9f |bibcode=2021IOPSN...2a5003H |issn=2633-1357|doi-access=free |hdl=11336/181371 |hdl-access=free }}</ref> The phenomenological justification for the effectiveness of the negative binomial distribution in these contexts remained unknown for fifty years, since their first observation in 1973.<ref>{{Cite journal |last=Giovannini |first=A. |date=June 1973 |title="Thermal chaos" and "coherence" in multiplicity distributions at high energies |url=http://dx.doi.org/10.1007/bf02734689 |journal=Il Nuovo Cimento A |volume=15 |issue=3 |pages=543–551 |doi=10.1007/bf02734689 |bibcode=1973NCimA..15..543G |s2cid=118805136 |issn=0369-3546|url-access=subscription }}</ref> In 2023, a proof from [[first principle]]s was eventually demonstrated by Scott V. Tezlaf, where it was shown that the negative binomial distribution emerges from [[Spacetime symmetries|symmetries]] in the [[Dynamics (mechanics)|dynamical equations]] of a [[canonical ensemble]] of particles in [[Minkowski space]].<ref name=":1">{{Cite journal |last=Tezlaf |first=Scott V. |date=2023-09-29 |title=Significance of the negative binomial distribution in multiplicity phenomena |url=https://iopscience.iop.org/article/10.1088/1402-4896/acfead |journal=Physica Scripta |volume=98 |issue=11 |doi=10.1088/1402-4896/acfead |arxiv=2310.03776 |bibcode=2023PhyS...98k5310T |s2cid=263300385 |issn=0031-8949}}</ref> Roughly, given an expected number of trials <math>\langle n \rangle</math> and expected number of successes <math>\langle r \rangle</math>, where
+The negative binomial distribution has been the most effective [[statistical model]] for a broad range of multiplicity observations in [[particle collision]] experiments, e.g., <math>p\bar p,\ hh,\ hA,\ AA,\ e^{+}e^-</math><ref>{{Cite journal |last1=Grosse-Oetringhaus |first1=Jan Fiete |last2=Reygers |first2=Klaus |date=2010-08-01 |title=Charged-particle multiplicity in proton–proton collisions |url=https://iopscience.iop.org/article/10.1088/0954-3899/37/8/083001 |journal=Journal of Physics G: Nuclear and Particle Physics |volume=37 |issue=8 |article-number=083001 |doi=10.1088/0954-3899/37/8/083001 |issn=0954-3899|arxiv=0912.0023 |s2cid=119233810 }}</ref><ref>{{Cite journal |last1=Rybczyński |first1=Maciej |last2=Wilk |first2=Grzegorz |last3=Włodarczyk |first3=Zbigniew |date=2019-05-31 |title=Intriguing properties of multiplicity distributions |journal=Physical Review D |language=en |volume=99 |issue=9 |article-number=094045 |doi=10.1103/PhysRevD.99.094045 |arxiv=1811.07197 |bibcode=2019PhRvD..99i4045R |issn=2470-0010|doi-access=free }}</ref><ref>{{Cite journal |last1=Tarnowsky |first1=Terence J. |last2=Westfall |first2=Gary D. |date=2013-07-09 |title=First study of the negative binomial distribution applied to higher moments of net-charge and net-proton multiplicity distributions |journal=Physics Letters B |volume=724 |issue=1 |pages=51–55 |doi=10.1016/j.physletb.2013.05.064 |arxiv=1210.8102 |bibcode=2013PhLB..724...51T |issn=0370-2693|doi-access=free }}</ref><ref>{{Cite journal |last1=Derrick |first1=M. |last2=Gan |first2=K. K. |last3=Kooijman |first3=P. |last4=Loos |first4=J. S. |last5=Musgrave |first5=B. |last6=Price |first6=L. E. |last7=Repond |first7=J. |last8=Schlereth |first8=J. |last9=Sugano |first9=K. |last10=Weiss |first10=J. M. |last11=Wood |first11=D. E. |last12=Baranko |first12=G. |last13=Blockus |first13=D. |last14=Brabson |first14=B. |last15=Brom |first15=J. M. |date=1986-12-01 |title=<nowiki>Study of quark fragmentation in ${e}^{+}$${e}^{\mathrm{\ensuremath{-}}}$ annihilation at 29 GeV: Charged-particle multiplicity and single-particle rapidity distributions</nowiki> |url=https://link.aps.org/doi/10.1103/PhysRevD.34.3304 |journal=Physical Review D |volume=34 |issue=11 |pages=3304–3320 |doi=10.1103/PhysRevD.34.3304|pmid=9957066 |hdl=1808/15222 |hdl-access=free }}</ref><ref>{{Cite journal |last=Zborovský |first=I. |date=2018-10-10 |title=Three-component multiplicity distribution, oscillation of combinants and properties of clans in pp collisions at the LHC |journal=The European Physical Journal C |language=en |volume=78 |issue=10 |page=816 |doi=10.1140/epjc/s10052-018-6287-x |arxiv=1811.11230 |bibcode=2018EPJC...78..816Z |issn=1434-6052|doi-access=free }}</ref> (See <ref>{{Cite book |last1=Kittel |first1=Wolfram |title=Soft multihardon dynamics |last2=De Wolf |first2=Eddi A |publisher=World Scientific |year=2005}}</ref> for an overview), and is argued to be a [[scale-invariant]] property of matter,<ref>{{Cite journal |last=Schaeffer |first=R |date=1984 |title=Determination of the galaxy N-point correlation function |journal=Astronomy and Astrophysics |volume=134 |issue=2 |pages=L15|bibcode=1984A&A...134L..15S }}</ref><ref>{{Cite journal |last=Schaeffer |first=R |date=1985 |title=The probability generating function for galaxy clustering |journal=Astronomy and Astrophysics |volume=144 |issue=1 |pages=L1–L4|bibcode=1985A&A...144L...1S }}</ref> providing the best fit for astronomical observations, where it predicts the number of galaxies in a region of space.<ref>{{Cite journal |last1=Perez |first1=Lucia A. |last2=Malhotra |first2=Sangeeta |last3=Rhoads |first3=James E. |last4=Tilvi |first4=Vithal |date=2021-01-07 |title=Void Probability Function of Simulated Surveys of High-redshift Ly α Emitters |journal=The Astrophysical Journal |volume=906 |issue=1 |page=58 |doi=10.3847/1538-4357/abc88b |arxiv=2011.03556 |bibcode=2021ApJ...906...58P |issn=1538-4357 |doi-access=free }}</ref><ref>{{Cite journal |last1=Hurtado-Gil |first1=Lluís |last2=Martínez |first2=Vicent J. |last3=Arnalte-Mur |first3=Pablo |last4=Pons-Bordería |first4=María-Jesús |last5=Pareja-Flores |first5=Cristóbal |last6=Paredes |first6=Silvestre |date=2017-05-01 |title=The best fit for the observed galaxy counts-in-cell distribution function |url=https://www.aanda.org/articles/aa/abs/2017/05/aa29097-16/aa29097-16.html |journal=Astronomy & Astrophysics |language=en |volume=601 |pages=A40 |doi=10.1051/0004-6361/201629097 |arxiv=1703.01087 |bibcode=2017A&A...601A..40H |issn=0004-6361|doi-access=free }}</ref><ref>{{Cite journal |last1=Elizalde |first1=E. |last2=Gaztanaga |first2=E. |date=January 1992 |title=Void probability as a function of the void's shape and scale-invariant models |journal=Monthly Notices of the Royal Astronomical Society |volume=254 |issue=2 |pages=247–256 |doi=10.1093/mnras/254.2.247 |issn=0035-8711|doi-access=free |hdl=2060/19910019799 |hdl-access=free }}</ref><ref>{{Cite journal |last1=Hameeda |first1=M |last2=Plastino |first2=Angelo |last3=Rocca |first3=M C |date=2021-03-01 |title=Generalized Poisson distributions for systems with two-particle interactions |journal=IOP SciNotes |volume=2 |issue=1 |page=015003 |doi=10.1088/2633-1357/abec9f |bibcode=2021IOPSN...2a5003H |issn=2633-1357|doi-access=free |hdl=11336/181371 |hdl-access=free }}</ref> The phenomenological justification for the effectiveness of the negative binomial distribution in these contexts remained unknown for fifty years, since their first observation in 1973.<ref>{{Cite journal |last=Giovannini |first=A. |date=June 1973 |title="Thermal chaos" and "coherence" in multiplicity distributions at high energies |journal=Il Nuovo Cimento A |volume=15 |issue=3 |pages=543–551 |doi=10.1007/bf02734689 |bibcode=1973NCimA..15..543G |s2cid=118805136 |issn=0369-3546}}</ref> In 2023, a proof from [[first principle]]s was eventually demonstrated by Scott V. Tezlaf, where it was shown that the negative binomial distribution emerges from [[Spacetime symmetries|symmetries]] in the [[Dynamics (mechanics)|dynamical equations]] of a [[canonical ensemble]] of particles in [[Minkowski space]].<ref name=":1">{{Cite journal |last=Tezlaf |first=Scott V. |date=2023-09-29 |title=Significance of the negative binomial distribution in multiplicity phenomena |url=https://iopscience.iop.org/article/10.1088/1402-4896/acfead |journal=Physica Scripta |volume=98 |issue=11 |doi=10.1088/1402-4896/acfead |arxiv=2310.03776 |bibcode=2023PhyS...98k5310T |s2cid=263300385 |issn=0031-8949}}</ref> Roughly, given an expected number of trials <math>\langle n \rangle</math> and expected number of successes <math>\langle r \rangle</math>, where
-: <math>\langle \mathcal{n} \rangle - \langle r \rangle = k,  \quad \quad  \langle p \rangle  = \frac{\langle r \rangle}{\langle \mathcal{n} \rangle}  \quad\quad \quad \implies \quad\quad \quad
+<math display="block">\begin{align}
-\langle \mathcal{n} \rangle = \frac{k}{1-\langle p \rangle},  \quad \quad \langle {r} \rangle = \frac{k\langle p \rangle}{1 - \langle p \rangle},</math>
+\langle \mathcal{n} \rangle - \langle r \rangle &= k, &
+\langle p \rangle  &= \frac{\langle r \rangle}{\langle \mathcal{n} \rangle} \\[1ex]
+\implies
+\langle \mathcal{n} \rangle &= \frac{k}{1-\langle p \rangle}, &
+\langle {r} \rangle &= \frac{k\langle p \rangle}{1 - \langle p \rangle},
+\end{align}</math>
 an [[Isomorphism|isomorphic]] set of equations can be identified with the parameters of a [[Special relativity|relativistic]] [[current density]] of a canonical ensemble of massive particles, via
-: <math>c^2\langle \rho^2 \rangle  - \langle j^2 \rangle = c^2\rho_0^2, \quad \quad  \quad \langle \beta^2_v \rangle = \frac{\langle j^2 \rangle}{c^2\langle \rho^2 \rangle}  \quad \quad \implies \quad \quad
+<math display="block">\begin{align}
-c^2\langle \rho^2 \rangle = \frac{c^2\rho_0^2}{1-\langle \beta^2_v \rangle},  \quad \quad  \quad \langle j^2 \rangle = \frac{c^2\rho_0^2  \langle \beta^2_v \rangle}{1-\langle \beta^2_v \rangle},</math>
+c^2\left\langle \rho^2 \right\rangle - \left\langle j^2 \right\rangle &= c^2 \rho_0^2, &
+\left\langle \beta^2_v \right\rangle &= \frac{\left\langle j^2 \right\rangle}{c^2\langle \rho^2 \rangle}  \\[1ex]
+\implies
+c^2 \left\langle \rho^2 \right\rangle &= \frac{c^2 \rho_0^2}{1 - \left\langle \beta^2_v \right\rangle}, &
+\left\langle j^2 \right\rangle &= \frac{c^2\rho_0^2 \left\langle \beta^2_v \right\rangle}{1 - \left\langle \beta^2_v \right\rangle},
+\end{align}</math>
 where <math>\rho_0</math> is the rest [[density]], <math>\langle \rho ^2 \rangle</math> is the relativistic mean square density, <math>\langle j ^2 \rangle</math> is the relativistic mean square current density, and <math>\langle \beta^2_v \rangle=\langle v^2 \rangle /c^2</math>, where <math>\langle v ^2 \rangle</math> is the [[Maxwell–Boltzmann distribution|mean square speed]] of the particle ensemble and <math>c</math> is the [[speed of light]]—such that one can establish the following [[Bijection|bijective map]]:
-: <math>c^2\rho_0^2 \mapsto k, \quad \quad \langle \beta^2_v \rangle \mapsto \langle p \rangle, \quad \quad c^2\langle\rho^2 \rangle  \mapsto \langle \mathcal{n} \rangle, \quad \quad \langle j^2 \rangle \mapsto \langle r \rangle.</math>
+<math display="block">\begin{align}
+c^2\rho_0^2 & \mapsto k, &
+\langle \beta^2_v \rangle &\mapsto \langle p \rangle, \\[1ex]
+c^2\langle\rho^2 \rangle &\mapsto \langle \mathcal{n} \rangle, &
+\langle j^2 \rangle &\mapsto \langle r \rangle.
+\end{align}</math>
 A rigorous alternative proof of the above correspondence has also been demonstrated through [[quantum mechanics]] via the Feynman [[Path integral formulation|path integral]].<ref name=":1" />

Negative binomial distribution: Difference between revisions

Latest revision as of 19:03, 1 November 2025

Definitions

Probability mass function

Cumulative distribution function

Alternative formulations

Alternative parameterizations

Examples

Length of hospital stay

Selling candy

Properties

Expectation

Expectation of successes

Variance

Relation to the binomial theorem

Recurrence relations

Related distributions

Poisson distribution

Gamma–Poisson mixture

Distribution of a sum of geometrically distributed random variables

Representation as compound Poisson distribution

(a,b,0) class of distributions

Statistical inference

Parameter estimation

MVUE for p

Maximum likelihood estimation

Occurrence and applications

Waiting time in a Bernoulli process

Overdispersed Poisson

Multiplicity observations (physics)

History

See also

References

Navigation menu

Search