Negative binomial distribution


In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified/constant/fixed number of successes occur. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.
An alternative formulation is to model the number of total trials. In fact, for a specified number of successes, the number of failures is random because the number of total trials is random. For example, we could use the negative binomial distribution to model the number of days a certain machine works before it breaks down.
The negative binomial distribution has a variance, with the distribution becoming identical to Poisson in the limit for a given mean . Here is the success probability of each Bernoulli trial. This can make the distribution a useful overdispersed alternative to the Poisson distribution, for example for a robust modification of Poisson regression. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting. More generally, it may be appropriate where events have positively correlated occurrences causing a larger variance than if the occurrences were independent, due to a positive covariance term.
The term "negative binomial" is likely due to the fact that a certain binomial coefficient that appears in the formula for the probability mass function of the distribution can be written more simply with negative numbers.

Definitions

Imagine a sequence of independent Bernoulli trials: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is and of failure is. We observe this sequence until a predefined number of successes occurs. Then the random number of observed failures,, follows the negative binomial distribution:

Probability mass function

The probability mass function of the negative binomial distribution is
where is the number of successes, is the number of failures, and is the probability of success on each trial.
Here, the quantity in parentheses is the binomial coefficient, and is equal to
Note that is the Gamma function, and is the multiset coefficient.
There are failures chosen from trials rather than because the last of the trials is by definition a success.
This quantity can alternatively be written in the following manner, explaining the name "negative binomial":
Note that by the last expression and the binomial series, for every and,
hence the terms of the probability mass function indeed add up to one as below.
To understand the above definition of the probability mass function, note that the probability for every specific sequence of successes and failures is, because the outcomes of the trials are supposed to happen independently. Since the -th success always comes last, it remains to choose the trials with failures out of the remaining trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length.
An alternate interpretation of the probability mass function's binomial coefficient arises when considering the equivalent multiset coefficient. A sequence of trials ending in successes can be represented by a tuple of non-negative integers, where each integer represents the number of failures seen before the next success. Then by applying stars and bars, it can be seen that the number of such tuples that sum to is given by.

Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function:
It can also be expressed in terms of the cumulative distribution function of the binomial distribution:

Alternative formulations

Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable is counting different things. These variations can be seen in the table here:


Each of the four definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is:. The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is:. These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.
  • The definition where is the number of trials that occur for a given number of successes is similar to the primary definition, except that the number of trials is given instead of the number of failures. This adds to the value of the random variable, shifting its support and mean.
  • The definition where is the number of successes that occur for a given number of failures is similar to the primary definition used in this article, except that numbers of failures and successes are switched when considering what is being counted and what is given. Note however, that still refers to the probability of "success".
  • The definition of the negative binomial distribution can be extended to the case where the parameter can take on a positive real value. Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function. The problem of extending the definition to real-valued boils down to extending the binomial coefficient to its real-valued counterpart, based on the gamma function: After substituting this expression in the original definition, we say that has a negative binomial distribution if it has a probability mass function: Here is a real, positive number.
In negative binomial regression, the distribution is specified in terms of its mean,, which is then related to explanatory variables as in linear regression or other generalized linear models. From the expression for the mean, one can derive and. Then, substituting these expressions in the one for the probability mass function when is real-valued, yields this parametrization of the probability mass function in terms of :
The variance can then be written as. Some authors prefer to set, and express the variance as. In this context, and depending on the author, either the parameter or its reciprocal is referred to as the "dispersion parameter", "shape parameter" or "clustering coefficient", or the "heterogeneity" or "aggregation" parameter. The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter towards zero corresponds to increasing aggregation of the organisms; increase of towards infinity corresponds to absence of aggregation, as can be described by Poisson regression.

Alternative parameterizations

Sometimes the distribution is parameterized in terms of its mean and variance :
Another popular parameterization uses and the failure odds :

Examples

Length of hospital stay

Hospital length of stay is an example of real-world data that can be modelled well with a negative binomial distribution via negative binomial regression.

Selling candy

Pat Collis is required to sell candy bars to raise money for the 6th grade field trip. Pat is not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing.
What's the probability of selling the last candy bar at the -th house?
Successfully selling candy enough times is what defines our stopping criterion, so in this case represents the number of failures and represents the number of successes. Recall that the distribution describes the probability of failures and successes in trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials this takes is therefore. The random variable we are interested in is the number of houses, so we substitute into a mass function and obtain the following mass function of the distribution of houses :
What's the probability that Pat finishes on the tenth house?
What's the probability that Pat finishes on or before reaching the eighth house?
To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities:
What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?
This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house:
Because of the rather high probability that Pat will sell to each house, the probability of her not fulfilling her quest is vanishingly slim.

Properties

Expectation

The expected total number of trials needed to see successes is. Thus, the expected number of failures would be this value, minus the successes:

Expectation of successes

The expected total number of failures in a negative binomial distribution with parameters is. To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until successes are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: and set. Now we would expect about successes in total. Say the experiment was performed times. Then there are successes in total. So we would expect, so. See that is just the average number of trials per experiment. That is what we mean by "expectation". The average number of failures per experiment is. This agrees with the mean given in the box on the right-hand side of this page.
A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let with the convention represents the number of failures observed before successes with the probability of success being. And let where represents the number of failures before seeing a success. We can think of as the waiting time between the th and th success. Thus
The mean is
which follows from the fact.