Sub-Gaussian distribution

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.
Often in analysis, we divide an object into two parts, a central bulk and a distant tail, then analyze each separately. In probability, this division usually goes like "Everything interesting happens near the center. The tail event is so rare, we may safely ignore that." Subgaussian distributions are worthy of study, because the gaussian distribution is well-understood, and so we can give sharp bounds on the rarity of the tail event. Similarly, the subexponential distributions are also worthy of study.
Formally, the probability distribution of a random variable ' is called subgaussian if there is a positive constant C such that for every,
There are many equivalent definitions. For example, a random variable ' is sub-Gaussian iff its distribution function is bounded from above by the distribution function of a Gaussian:
where is constant and is a mean zero Gaussian random variable.

Definitions

Subgaussian norm

The subgaussian norm of, denoted as, isIn other words, it is the Orlicz norm of generated by the Orlicz function By condition below, subgaussian random variables can be characterized as those random variables with finite subgaussian norm.

Variance proxy

If there exists some such that for all, then is called a variance proxy, and the smallest such is called the optimal variance proxy and denoted by.
Since when is Gaussian, we then have, as it should.

Equivalent definitions

Let ' be a random variable with zero mean. Let be positive constants. The following conditions are equivalent:

Tail probability bound: for all ;
Finite subgaussian norm: ;

Moment: for all ', where is the Gamma function;
Moment: for all ;
Moment-generating function, or variance proxy : for all ;
Moment-generating function : for all ;
Union bound: for some c > 0, for all n > c, where are i.i.d copies of X;
Subexponential: has a subexponential distribution.

Furthermore, the constant is the same in the definitions to, up to an absolute constant. So for example, given a random variable satisfying and, the minimal constants in the two definitions satisfy, where are constants independent of the random variable.

Proof of equivalence

As an example, the first four definitions are equivalent by the proof below.
Proof. By the layer cake representation,
After a change of variables, we find that By the Taylor series which is less than or equal to for. Let, then
By Markov's inequality, by asymptotic formula for gamma function:.
From the proof, we can extract a cycle of three inequalities:

If, then for all.
If for all, then.
If, then.

In particular, the constant provided by the definitions are the same up to a constant factor, so we can say that the definitions are equivalent up to a constant independent of.
Similarly, because up to a positive multiplicative constant, for all, the definitions and are also equivalent up to a constant.

Basic properties

means that, where the positive constant is independent of and.

Concentration

Strictly subgaussian

Expanding the cumulant generating function:we find that. At the edge of possibility, we define that a random variable satisfying is called strictly subgaussian.

Properties

Theorem. Let be a subgaussian random variable with mean zero. If all zeros of its characteristic function are real, then is strictly subgaussian.
Corollary. If are independent and strictly subgaussian, then any linear sum of them is strictly subgaussian.

Examples

By calculating the characteristic functions, we can show that some distributions are strictly subgaussian: symmetric uniform distribution, symmetric Bernoulli distribution.
Since a symmetric uniform distribution is strictly subgaussian, its convolution with itself is strictly subgaussian. That is, the symmetric triangular distribution is strictly subgaussian.
Since the symmetric Bernoulli distribution is strictly subgaussian, any symmetric Binomial distribution is strictly subgaussian.

Examples

The optimal variance proxy is known for many standard probability distributions, including the beta, Bernoulli, Dirichlet, Kumaraswamy, triangular, truncated Gaussian, and truncated exponential.

Bernoulli distribution

Let be two positive numbers. Let be a centered Bernoulli distribution, so that it has mean zero, then. Its subgaussian norm is where is the unique positive solution to.
Let ' be a random variable with symmetric Bernoulli distribution. That is, ' takes values and with probabilities each. Since ', it follows thatand hence ' is a subgaussian random variable.

Bounded distributions

Bounded distributions have no tail at all, so clearly they are subgaussian.
If is bounded within the interval, Hoeffding's lemma states that. Hoeffding's inequality is the Chernoff bound obtained using this fact.

Convolutions

Since the sum of subgaussian random variables is still subgaussian, the convolution of subgaussian distributions is still subgaussian. In particular, any convolution of the normal distribution with any bounded distribution is subgaussian.

Mixtures

Given subgaussian distributions, we can construct an additive mixture as follows: first randomly pick a number, then pick.
Since we have, and so the mixture is subgaussian.
In particular, any gaussian mixture is subgaussian.
More generally, the mixture of infinitely many subgaussian distributions is also subgaussian, if the subgaussian norm has a finite supremum:.

Subgaussian random vectors

So far, we have discussed subgaussianity for real-valued random variables. We can also define subgaussianity for random vectors. The purpose of subgaussianity is to make the tails decay fast, so we generalize accordingly: a subgaussian random vector is a random vector where the tail decays fast.
Let be a random vector taking values in.
Define.

, where is the unit sphere in. Similarly for the variance proxy
is subgaussian iff.

Theorem. For any positive integer, the uniformly distributed random vector is subgaussian, with.
This is not so surprising, because as, the projection of to the first coordinate converges in distribution to the standard normal distribution.

Maximum inequalities

Inequalities

Theorem. There exists a positive constant such that given any number of independent mean-zero subgaussian random variables, Theorem. There exists a positive constant such that given any number of independent mean-zero subgaussian random variables,Theorem. There exists a positive constant such that given any number of independent mean-zero subexponential random variables,
Theorem. There exists a positive constant such that given any number of independent mean-zero variance-one subgaussian random variables, any, and any,

Hanson-Wright inequality

The Hanson-Wright inequality states that if a random vector is subgaussian in a certain sense, then any quadratic form of this vector,, is also subgaussian/subexponential. Further, the upper bound on the tail of, is uniform.
A weak version of the following theorem was proved in. There are many extensions and variants. Much like the central limit theorem, the Hanson-Wright inequality is more a cluster of theorems with the same purpose, than a single theorem. The purpose is to take a subgaussian vector and uniformly bound its quadratic forms.
Theorem. There exists a constant, such that:
Let be a positive integer. Let be independent random variables, such that each satisfies. Combine them into a random vector. For any matrix, we havewhere, and is the Frobenius norm of the matrix, and is the operator norm of the matrix.
In words, the quadratic form has its tail uniformly bounded by an exponential, or a gaussian, whichever is larger.
In the statement of the theorem, the constant is an "absolute constant", meaning that it has no dependence on. It is a mathematical constant much like pi and e.

Consequences

Theorem.'' There exists a constant, such that:
Let be positive integers. Let be independent random variables, such that each satisfies. Combine them into a random vector. For any matrix, we haveIn words, the random vector is concentrated on a spherical shell of radius, such that is subgaussian, with subgaussian norm.