Bias of an estimator


In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased.
All else being equal, an unbiased estimator is preferable to a biased estimator, although in practice, biased estimators are frequently used. When a biased estimator is used, bounds of the bias are calculated. A biased estimator may be used for various reasons: because an unbiased estimator does not exist without further assumptions about a population; because an estimator is difficult to compute ; because a biased estimator may be unbiased with respect to different measures of central tendency; because a biased estimator gives a lower value of some loss function compared with unbiased estimators ; or because in some cases being unbiased is too strong a condition, and the only unbiased estimators are not useful.
Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes median-unbiased from the usual mean-unbiasedness property.
Mean-unbiasedness is not preserved under non-linear transformations, though median-unbiasedness is ; for example, the sample variance is a biased estimator for the population variance. These are all illustrated below.
An unbiased estimator for a parameter need not always exist. For example, there is no unbiased estimator for the reciprocal of the parameter of a binomial random variable.

Definition

Suppose we have a statistical model, parameterized by a real number θ, giving rise to a probability distribution for observed data,, and a statistic which serves as an estimator of θ based on any observed data. That is, we assume that our data follows some unknown distribution , and then we construct some estimator that maps observed data to values that we hope are close to θ. The bias of relative to is defined as
where denotes expected value over the distribution . The second equation follows since θ is measurable with respect to the conditional distribution.
An estimator is said to be unbiased if its bias is zero for all values of the parameter θ, or equivalently, if the expected value of the estimator matches that of the parameter. Unbiasedness is not guaranteed to carry over. For example, if is an unbiased estimator for parameter θ, it is not guaranteed in general that g is an unbiased estimator for g, unless g is a linear function.
In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using the mean signed difference.

Examples

Sample variance

The sample variance highlights two different issues about bias and risk. First, the “naive” estimator that divides by n
is biased downward because the sample mean is estimated from the same data. Multiplying by n/ yields an unbiased estimator. Second, unbiasedness does not imply minimum mean squared error.
Suppose X1,..., Xn are independent and identically distributed random variables with expectation μ and variance σ2. If the sample mean and uncorrected sample variance are defined as
then S2 is a biased estimator of σ2. This follows immediately from the law of total variance because
In other words, the expected value of the uncorrected sample variance does not equal the population variance σ2, unless multiplied by a normalization factor. The ratio between the biased and unbiased estimates of the variance is known as Bessel's correction. The sample mean, on the other hand, is an unbiased estimator of the population mean μ. The equality of the second term on the right-hand side in the equation above can be understood in terms of Bienaymé's identity,
The reason that an uncorrected sample variance, S2, is biased stems from the fact that the sample mean is an ordinary least squares estimator for μ: is the number that makes the sum as small as possible. That is, when any other number is plugged into this sum, the sum can only increase. In particular, the choice gives,
and then
The above discussion can be understood in geometric terms: the vector can be decomposed into the "mean part" and "variance part" by projecting to the direction of and to that direction's orthogonal complement hyperplane. One gets for the part along and for the complementary part. Since this is an orthogonal decomposition, Pythagorean theorem says, and taking expectations we get, as above.
If the distribution of is rotationally symmetric, as in the case when are sampled from a Gaussian, then on average, the dimension along contributes to equally as the directions perpendicular to, so that and. This is in fact true in general, as explained above.

Estimating a Poisson probability

A far more extreme case of a biased estimator being better than any unbiased estimator arises from the Poisson distribution. Suppose that X has a Poisson distribution with expectation λ. Suppose it is desired to estimate
with a sample of size 1.
Since the expectation of an unbiased estimator δ is equal to the estimand, i.e.
the only function of the data constituting an unbiased estimator is
To see this, note that when decomposing eλ from the above expression for expectation, the sum that is left is a Taylor series expansion of eλ as well, yielding eλeλ = e−2λ.
If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is very likely to be near 0, which is the opposite extreme. And, if X is observed to be 101, then the estimate is even more absurd: It is −1, although the quantity being estimated must be positive.
The maximum likelihood estimator
is far better than this unbiased estimator. Not only is its value always positive but it is also more accurate in the sense that its mean squared error
is smaller; compare the unbiased estimator's MSE of
The MSEs are functions of the true value λ. The bias of the maximum-likelihood estimator is:

Maximum of a discrete uniform distribution

The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximum-likelihood estimator of n is X, even though the expectation of X given n is only /2; we can be certain only that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1.

Median-unbiased estimators

The theory of median-unbiased estimators was revived by George W. Brown in 1947:
Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. They are invariant under one-to-one transformations.
There are methods of construction median-unbiased estimators for probability distributions that have monotone likelihood-functions, such as one-parameter exponential families, to ensure that they are optimal. One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss-functions.

Bias with respect to other loss functions

Any minimum-variance mean-unbiased estimator minimizes the risk with respect to the squared-error loss function, as observed by Gauss. A minimum-average absolute deviation median-unbiased estimator minimizes the risk with respect to the absolute loss function, as observed by Laplace. Other loss functions are used in statistics, particularly in robust statistics.

Effect of transformations

For univariate parameters, median-unbiased estimators remain median-unbiased under transformations that preserve order.
Note that, when a transformation is applied to a mean-unbiased estimator, the result need not be a mean-unbiased estimator of its corresponding population statistic. By Jensen's inequality, a convex function as transformation will introduce positive bias, while a concave function will introduce negative bias, and a function of mixed convexity may introduce bias in either direction, depending on the specific function and distribution. That is, for a non-linear function f and a mean-unbiased estimator U of a parameter p, the composite estimator f need not be a mean-unbiased estimator of f. For example, the square root of the unbiased estimator of the population variance is a mean-unbiased estimator of the population standard deviation: the square root of the unbiased sample variance, the corrected sample standard deviation, is biased. The bias depends both on the sampling distribution of the estimator and on the transform, and can be quite involved to calculate – see unbiased estimation of standard deviation for a discussion in this case.

Bias, variance and mean squared error

While bias quantifies the average difference to be expected between an estimator and an underlying parameter, an estimator based on a finite sample can additionally be expected to differ from the parameter due to the randomness in the sample.
An estimator that minimises the bias will not necessarily minimise the mean square error.
One measure which is used to try to reflect both types of difference is the mean square error,
This can be shown to be equal to the square of the bias, plus the variance:
When the parameter is a vector, an analogous decomposition applies:
where is the trace of the covariance matrix of the estimator and is the square vector norm.