Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval or in terms of two positive parameters, denoted by alpha and beta, that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.
In [|Bayesian inference], the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial, and geometric distributions.
The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind, whereas beta distribution of the second kind is an alternative name for the beta prime distribution. The generalization to multiple variables is called a Dirichlet distribution.
Definitions
Probability density function
The probability density function of the beta distribution, for or, and shape parameters,, is a power function of the variable and of its reflection as follows:where is the gamma function. The beta function,, is a normalization constant to ensure that the total probability is 1. In the above equations is a realization—an observed value that actually occurred—of a random variable.
Several authors, including N. L. Johnson and S. Kotz, use the symbols and for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the Bernoulli distribution, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters and approach zero.
In the following, a random variable beta-distributed with parameters and will be denoted by:
Other notations for beta-distributed random variables used in the statistical literature are and.
Cumulative distribution function
The cumulative distribution function iswhere is the incomplete beta function and is the regularized incomplete beta function.
For positive integers α and β, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a binomial distribution with
Alternative parameterizations
Two parameters
Mean and sample size
The beta distribution may also be reparameterized in terms of its mean μ and the sum of the two shape parameters. Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ν = α·Posterior + β·Posterior is only correct for the Haldane prior probability Beta. Specifically, for the Bayes prior Beta the correct interpretation would be sample size = α·Posterior + β Posterior − 2, or ν = + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. ν = α + β is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta prior in Bayes' theorem.This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via
Under this parametrization, one may place an uninformative prior probability over the mean, and a vague prior probability over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it.
Mode and concentration
beta distributions, which have, can be parametrized in terms of mode and "concentration". The mode,, and concentration,, can be used to define the usual shape parameters as follows:For the mode,, to be well-defined, we need, or equivalently. If instead we define the concentration as, the condition simplifies to and the beta density at and can be written as:
where directly scales the sufficient statistics, and. Note also that in the limit,, the distribution becomes flat.
Mean and variance
Solving the system of equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters α and β, one can express the α and β parameters in terms of the mean and the variance :This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters α and β. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:
Four parameters
A beta distribution with the two shape parameters α and β is supported on the range or. It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, a, and maximum c, values of the distribution, by a linear transformation substituting the non-dimensional variable x in terms of the new variable y and the parameters a and c:The probability density function of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range,, and with the "y" variable shifted and scaled as follows:
That a random variable Y is beta-distributed with four parameters α, β, a, and c will be denoted by:
Some measures of central location are scaled and shifted, as follows:
Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.
The shape parameters of Y can be written in term of its mean and variance as
The statistical dispersion measures are scaled by the range, linearly for the mean deviation and nonlinearly for the variance:
Since the skewness and excess kurtosis are non-dimensional quantities, they are independent of the parameters a and c, and therefore equal to the expressions given above in terms of X :
Properties
Measures of central tendency
Mode
The mode of a beta distributed random variable X with α, β > 1 is the most likely value of the distribution, and is given by the following expression:When both parameters are less than one, this is the anti-mode: the lowest point of the probability density curve.
Letting α = β, the expression for the mode simplifies to 1/2, showing that for α = β > 1 the mode, is at the center of the distribution: it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for arbitrary values of α and β. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the value of the density function occurring at the end is finite. For example, in the case of α = 2, β = 1, the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, where the value of the density function approaches infinity. For example, in the case α = β = 1/2, the beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of these cases and whether the ends can be called modes or not.
- Whether the ends are part of the domain of the density function
- Whether a singularity can ever be called a mode
- Whether cases with two maxima should be called ''bimodal''
Median
- For symmetric cases α = β, median = 1/2.
- For α = 1 and β > 0, median
- For α > 0 and β = 1, median =
- For α = 3 and β = 2, median = 0.6142724318676105..., the real solution to the quartic equation 1 − 8x3 + 6x4 = 0, which lies in .
- For α = 2 and β = 3, median = 0.38572756813238945... = 1−median
A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula
When α, β ≥ 1, the relative error in this approximation is less than 4% and for both α ≥ 2 and β ≥ 2 it is less than 1%. The absolute error divided by the difference between the mean and the mode is similarly small:
File:Relative Error for Approximation to Median of Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px|Abs for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5File:Error in Median Apprx. relative to Mean-Mode distance for Beta Distribution with alpha and beta from 1 to 5 - J. Rodal.jpg|325px|Abs for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5