V-statistic

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals of the empirical distribution function are called statistical functionals. Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

Examples of statistical functions

The k-th central moment is the functional, where is the expected value of X. The associated statistical function is the sample k-th central moment,
The [Pearson's Chi-squared distribution|chi-squared test|chi-squared goodness-of-fit] statistic is a statistical function T, corresponding to the statistical functional
where A_i are the k cells and p_i are the specified probabilities of the cells under the null hypothesis.
The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional
where w is a specified weight function and F₀ is a specified null distribution. If w is the identity function then T is the well known Cramér–von-Mises goodness-of-fit statistic; if then T is the Anderson–Darling statistic.

Representation as a V-statistic

Suppose x₁,..., x_n is a sample. In typical applications the statistical function has a representation as the V-statistic
where h is a symmetric kernel function. Serfling discusses how to find the kernel in practice. V_mn is called a V-statistic of degree m.
A symmetric kernel of degree 2 is a function h, such that h = h for all x and y in the domain of h. For samples x₁,..., x_n, the corresponding V-statistic is defined

Example of a V-statistic

An example of a degree-2 V-statistic is the second central moment m₂.
If h = ²/2, the corresponding V-statistic is
which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the sample variance:

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in it is normal, in it is chi-squared, and in it is a weighted sum of chi-squared variables.
Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise.
There are a hierarchy of cases parallel to asymptotic theory of U-statistics. Let A be the property defined by:

Var = 0 for k < m, and Var > 0 for k = m;
n^m/2R_mn tends to zero.

Case m = 1 :
If A is true, the statistic is a sample mean and the Central Limit Theorem implies that T is asymptotically normal.
In the variance example, m₂ is asymptotically normal with mean and variance, where.
Case m = 2 :
Suppose A is true, and and. Then nV_2,n converges in distribution to a weighted sum of independent chi-squared variables:
where are independent standard normal variables and are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V_2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional is an example of a degenerate kernel V-statistic.