Bessel's correction

In statistics, Bessel's correction is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample. This method corrects the bias in the estimation of the population variance. It also partially corrects the bias in the estimation of the population standard deviation. However, the correction often increases the mean squared error in these estimations. This technique is named after Friedrich Bessel.

Formulation

In estimating the population variance from a sample when the population mean is unknown, the uncorrected sample variance is the mean of the squares of deviations of sample values from the sample mean. In this case, the sample variance is a biased estimator of the population variance.
Multiplying the uncorrected sample variance by the factor
gives an unbiased estimator of the population variance. In some literature, the above factor is called Bessel's correction.
One can understand Bessel's correction as the degrees of freedom in the [errors and errors and residuals in statistics|residuals in statistics|residuals] vector :
where is the sample mean. While there are n independent observations in the sample, there are only n − 1 independent residuals, as they sum to 0. For a more intuitive explanation of the need for Bessel's correction, see.
Generally Bessel's correction is an approach to reduce the bias due to finite sample size. Such finite-sample bias correction is also needed for other estimates like skew and kurtosis, but in these the inaccuracies are often significantly larger. To fully remove such bias it is necessary to do a more complex multi-parameter estimation. For instance a correct correction for the standard deviation depends on the kurtosis, but this again has a finite sample bias and it depends on the standard deviation, i.e., both estimations have to be merged.

Caveats

There are three caveats to consider regarding Bessel's correction:

It does not yield an unbiased estimator of standard deviation.
The corrected estimator often has a higher mean squared error than the uncorrected estimator. Furthermore, there is no population distribution for which it has the minimum MSE because a different scale factor can always be chosen to minimize MSE.
It is only necessary when the population mean is unknown. In practice, this generally happens.

Firstly, while the sample variance is an unbiased estimator of the population variance, its square root, the sample standard deviation, is a biased estimate of the population standard deviation; because the square root is a concave function, the bias is downward, by Jensen's inequality. There is no general formula for an unbiased estimator of the population standard deviation, though there are correction factors for particular distributions, such as the normal; see unbiased estimation of standard deviation for details. An approximation for the exact correction factor for the normal distribution is given by using n − 1.5 in the formula: the bias decays quadratically.
Secondly, the unbiased estimator does not minimize mean squared error, and generally has worse MSE than the uncorrected estimator. MSE can be minimized by using a different factor. The optimal value depends on excess kurtosis, as discussed in mean squared error: variance; for the normal distribution this is optimized by dividing by n + 1.
Thirdly, Bessel's correction is only necessary when the population mean is unknown, and one is estimating both population mean and population variance from a given sample, using the sample mean to estimate the population mean. In that case there are n degrees of freedom in a sample of n points, and simultaneous estimation of mean and variance means one degree of freedom goes to the sample mean and the remaining n − 1 degrees of freedom go to the sample variance. However, if the population mean is known, then the deviations of the observations from the population mean have n degrees of freedom and Bessel's correction is not applicable.

Terminology

This correction is so common that the term "sample variance" and "sample standard deviation" are frequently used to mean the corrected estimators, using n − 1. However caution is needed: some calculators and software packages may provide for both or only the more unusual formulation. This article uses the following symbols and definitions:

μ is the population mean
is the sample mean
σ² is the population variance
s_n² is the biased sample variance
s² is the unbiased sample variance

The standard deviations will then be the square roots of the respective variances. Since the square root introduces bias, the terminology "uncorrected" and "corrected" is preferred for the standard deviation estimators:

s_n is the uncorrected sample standard deviation
s is the corrected sample standard deviation, which is less biased, but still biased

Formula

The sample mean is given by
The biased sample variance is then written:
and the unbiased sample variance is written:

Proof

Suppose thus that are independent and identically distributed random variables with expectation and variance.
Knowing the values of the at an outcome of the underlying sample space, we would like to get a good estimate for the variance, which is unknown. To this end, we construct a mathematical formula containing the such that the expectation of this formula is precisely. This means that on average, this formula should produce the right answer.
The educated, but naive way of guessing the variance formula would be
where. This would be the variance if we had a discrete random variable on the discrete probability space that had value at. But let us calculate the expected value of this expression:
Therefore, our initial guess was wrong by a factor of. This is precisely Bessel's correction.
The last step used that the sum in question splits into one with equal resp. unequal indices. For independent and identically distributed variables this thus results in multiples of resp. :