Tweedie distribution


In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous.
Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.
The Tweedie distributions were first referred to by that name by Bent Jørgensen in a 1987 paper, crediting Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1982 at the Indian Statistical Institute Golden Jubilee International Conference in Calcutta.
In 1986, Shaul K. Bar-Lev and Peter Enis published a paper about the same topic in The Annals of Statistics.

Definitions

The Tweedie distributions are defined as subfamily of exponential dispersion models, with a special mean-variance relationship.
A random variable Y is Tweedie distributed Twp, if with mean, positive dispersion parameter and
where is called the Tweedie power parameter.
The probability distribution Pθ,''σ2 on the measurable sets A'', is given by
for some σ-finite measure νλ.
This representation uses the canonical parameter θ of an exponential dispersion model and cumulant function
where we used, or equivalently.

Properties

Additive exponential dispersion models

The models just described are in the reproductive form. An exponential dispersion model has always a dual: the additive form. If Y is reproductive, then with is in the additive form ED*, for Tweedie Tw*p. Additive models have the property that the distribution of the sum of independent random variables,
for which Zi ~ ED* with fixed θ and various λ are members of the family of distributions with the same θ,

Reproductive exponential dispersion models

A second class of exponential dispersion models exists designated by the random variable
where σ2 = 1/λ, known as reproductive exponential dispersion models. They have the property that for n independent random variables Yi ~ ED, with weighting factors wi and
a weighted average of the variables gives,
For reproductive models the weighted average of independent random variables with fixed μ and σ2 and various values for wi is a member of the family of distributions with same μ and σ2.
The Tweedie exponential dispersion models are both additive and reproductive; we thus have the ''duality transformation''

Scale invariance

A third property of the Tweedie models is that they are scale invariant: For a reproductive exponential dispersion model Twp and any positive constant c we have the property of closure under scale transformation,

The Tweedie power variance function

To define the variance function for exponential dispersion models we make use of the mean value mapping, the relationship between the canonical parameter θ and the mean μ. It is defined by the function
with cumulative function.
The variance function V is constructed from the mean value mapping,
Here the minus exponent in τ−1 denotes an inverse function rather than a reciprocal. The mean and variance of an additive random variable is then and.
Scale invariance implies that the variance function obeys the relationship.

The Tweedie deviance

The unit deviance of a reproductive Tweedie distribution is given by

The Tweedie cumulant generating functions

The properties of exponential dispersion models give us two differential equations. The first relates the mean value mapping and the variance function to each other,
The second shows how the mean value mapping is related to the cumulant function,
These equations can be solved to obtain the cumulant function for different cases of the Tweedie models. A cumulant generating function may then be obtained from the cumulant function. The additive CGF is generally specified by the equation
and the reproductive CGF by
where s is the generating function variable.
For the additive Tweedie models the CGFs take the form,
and for the reproductive models,
The additive and reproductive Tweedie models are conventionally denoted by the symbols Tw*p and Twp, respectively.
The first and second derivatives of the CGFs, with s = 0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,

The Tweedie convergence theorem

The Tweedie exponential dispersion models are fundamental in statistical theory consequent to their roles as foci of convergence for a wide range of statistical processes. Jørgensen et al proved a theorem that specifies the asymptotic behaviour of variance functions known as the Tweedie convergence theorem. This theorem, in technical terms, is stated thus: The unit variance function is regular of order p at zero provided that for μ as it approaches zero for all real values of p and c0 > 0. Then for a unit variance function regular of order p at either zero or infinity and for
for any, and we have
as or, respectively, where the convergence is through values of c such that is in the domain of θ and cp−2/σ2 is in the domain of λ. The model must be infinitely divisible as c2−p approaches infinity.
In nontechnical terms this theorem implies that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model. Almost all distribution functions with finite cumulant generating functions qualify as exponential dispersion models and most exponential dispersion models manifest variance functions of this form. Hence many probability distributions have variance functions that express this asymptotic behaviour, and the Tweedie distributions become foci of convergence for a wide range of data types.

Related distributions

The Tweedie distributions include a number of familiar distributions as well as some unusual ones, each being specified by the domain of the index parameter. We have the
  • extreme stable distribution, p < 0,
  • normal distribution, p = 0,
  • Poisson distribution, p = 1,
  • compound Poisson–gamma distribution, 1 < p < 2,
  • gamma distribution, p = 2,
  • positive stable distributions, 2 < p < 3,
  • Inverse Gaussian distribution, p = 3,
  • positive stable distributions, p > 3, and
  • extreme stable distributions, p = .
For 0 < p < 1 no Tweedie model exists. Note that all stable distributions mean actually generated by stable distributions.

Occurrence and applications

The Tweedie models and Taylor's power law

is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power-law relationship. For the population count Y with mean μ and variance var, Taylor's law is written,
where a and p are both positive constants. Since L. R. Taylor described this law in 1961 there have been many different explanations offered to explain it, ranging from animal behavior, a random walk model, a stochastic birth, death, immigration and emigration model, to a consequence of equilibrium and non-equilibrium statistical mechanics. No consensus exists as to an explanation for this model.
Since Taylor's law is mathematically identical to the variance-to-mean power law that characterizes the Tweedie models, it seemed reasonable to use these models and the Tweedie convergence theorem to explain the observed clustering of animals and plants associated with Taylor's law. The majority of the observed values for the power-law exponent p have fallen in the interval and so the Tweedie compound Poisson–gamma distribution would seem applicable. Comparison of the empirical distribution function to the theoretical compound Poisson–gamma distribution has provided a means to verify consistency of this hypothesis.
Whereas conventional models for Taylor's law have tended to involve ad hoc animal behavioral or population dynamic assumptions, the Tweedie convergence theorem would imply that Taylor's law results from a general mathematical convergence effect much as how the central limit theorem governs the convergence behavior of certain types of random data. Indeed, any mathematical model, approximation or simulation that is designed to yield Taylor's law is required to converge to the form of the Tweedie models.

Tweedie convergence and 1/''f'' noise

, or 1/f noise, refers to a pattern of noise characterized by a power-law relationship between its intensities S at different frequencies f,
where the dimensionless exponent γ ∈ . It is found within a diverse number of natural processes. Many different explanations for 1/f noise exist, a widely held hypothesis is based on Self-organized criticality where dynamical systems close to a critical point are thought to manifest scale-invariant spatial and/or temporal behavior.
In this subsection a mathematical connection between 1/f noise and the Tweedie variance-to-mean power law will be described. To begin, we first need to introduce self-similar processes: For the sequence of numbers
with mean
deviations
variance
and autocorrelation function
with lag k, if the autocorrelation of this sequence has the long range behavior
as k and where L is a slowly varying function at large values of k, this sequence is called a self-similar process.
The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of N elements into groups of m equal-sized segments so that new reproductive sequences, based on the mean values, can be defined:
The variance determined from this sequence will scale as the bin size changes such that
if and only if the autocorrelation has the limiting form
One can also construct a set of corresponding additive sequences
based on the expanding bins,
Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship
Since and are constants this relationship constitutes a variance-to-mean power law, with p = 2 - d.
The biconditional relationship above between the variance-to-mean power law and power law autocorrelation function, and the Wiener–Khinchin theorem imply that any sequence that exhibits a variance-to-mean power law by the method of expanding bins will also manifest 1/f noise, and vice versa. Moreover, the Tweedie convergence theorem, by virtue of its central limit-like effect of generating distributions that manifest variance-to-mean power functions, will also generate processes that manifest 1/f noise. The Tweedie convergence theorem thus provides an alternative explanation for the origin of 1/f noise, based its central limit-like effect.
Much as the central limit theorem requires certain kinds of random processes to have as a focus of their convergence the Gaussian distribution and thus express white noise, the Tweedie convergence theorem requires certain non-Gaussian processes to have as a focus of convergence the Tweedie distributions that express 1/f noise.