Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.
Definition
Suppose an unknown parameter is known to have a prior distribution. Let be an estimator of , and let be a loss function, such as squared error. The Bayes risk of is defined as, where the expectation is taken over the probability distribution of : this defines the risk function as a function of. An estimator is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators. Equivalently, the estimator which minimizes the posterior expected loss for each also minimizes the Bayes risk and therefore is a Bayes estimator.If the prior is improper then an estimator which minimizes the posterior expected loss for each is called a generalized Bayes estimator.
Examples
Minimum mean square error estimation
The most common risk function used for Bayesian estimation is the mean square error, also called squared error risk. The MSE is defined bywhere the expectation is taken over the joint distribution of and.
Posterior mean
Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,This is known as the minimum mean square error estimator.
Bayes estimators for conjugate priors
If there is no inherent reason to prefer one prior probability distribution over another, a conjugate prior is sometimes chosen for simplicity. A conjugate prior is defined as a prior distribution belonging to some parametric family, for which the resulting posterior distribution also belongs to the same family. This is an important property, since the Bayes estimator, as well as its statistical properties, can all be derived from the posterior distribution.Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.
Following are some examples of conjugate priors.
- If is Normal,, and the prior is normal,, then the posterior is also Normal and the Bayes estimator under MSE is given by
- If are iid Poisson random variables, and if the prior is Gamma distributed, then the posterior is also Gamma distributed, and the Bayes estimator under MSE is given by
- If are iid uniformly distributed, and if the prior is Pareto distributed, then the posterior is also Pareto distributed, and the Bayes estimator under MSE is given by
Alternative risk functions
Posterior median and other quantiles
A "linear" loss function, with, which yields the posterior median as the Bayes' estimate:Another "linear" loss function, which assigns different "weights" to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function:
Posterior mode
The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter are recommended, in order to use the mode as an approximation :Lp Estimators
One can also consider risk for which the loss is given byWhile optimal estimators can be difficult to characterize in closed-form, they do share many similar properties to those in case.
Other loss functions can be conceived, although the mean squared error is the most widely used and validated. Other loss functions are used in statistics, particularly in robust statistics.
Generalized Bayes estimators
The prior distribution has thus far been assumed to be a true probability distribution, in thatHowever, occasionally this can be a restrictive requirement. For example, there is no distribution for which every real number is equally likely. Yet, in some sense, such a "distribution" seems like a natural choice for a non-informative prior, i.e., a prior distribution which does not imply a preference for any particular value of the unknown parameter. One can still define a function, but this would not be a proper probability distribution since it has infinite mass,
Such measures, which are not probability distributions, are referred to as improper priors.
The use of an improper prior means that the Bayes risk is undefined. As a consequence, it is no longer meaningful to speak of a Bayes estimator that minimizes the Bayes risk. Nevertheless, in many cases, one can define the posterior distribution
This is a definition, and not an application of Bayes' theorem, since Bayes' theorem can only be applied when all distributions are proper. However, it is not uncommon for the resulting "posterior" to be a valid probability distribution. In this case, the posterior expected loss
is typically well-defined and finite. Recall that, for a proper prior, the Bayes estimator minimizes the posterior expected loss. When the prior is improper, an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator.
Example
A typical example is estimation of a location parameter with a loss function of the type. Here is a location parameter, i.e.,.It is common to use the improper prior in this case, especially when no other more subjective information is available. This yields
so the posterior expected loss
The generalized Bayes estimator is the value that minimizes this expression for a given. This is equivalent to minimizing
In this case it can be shown that the generalized Bayes estimator has the form, for some constant. To see this, let be the value minimizing when. Then, given a different value, we must minimize
This is identical to, except that has been replaced by. Thus, the expression minimizing is given by, so that the optimal estimator has the form
Empirical Bayes estimators
A Bayes estimator derived through the empirical Bayes method is called an empirical Bayes estimator. Empirical Bayes methods enable the use of auxiliary empirical data, from observations of related parameters, in the development of a Bayes estimator. This is done under the assumption that the estimated parameters are obtained from a common prior. For example, if independent observations of different parameters are performed, then the estimation performance of a particular parameter can sometimes be improved by using data from other observations.There are both parametric and non-parametric approaches to empirical Bayes estimation.
Example
The following is a simple example of parametric empirical Bayes estimation. Given past observations having conditional distribution, one is interested in estimating based on. Assume that the 's have a common prior which depends on unknown parameters. For example, suppose that is normal with unknown mean and variance We can then use the past observations to determine the mean and variance of in the following way.First, we estimate the mean and variance of the marginal distribution of using the maximum likelihood approach:
Next, we use the law of total expectation to compute and the law of total variance to compute such that
where and are the moments of the conditional distribution, which are assumed to be known. In particular, suppose that and that ; we then have
Finally, we obtain the estimated moments of the prior,
For example, if, and if we assume a normal prior, we conclude that, from which the Bayes estimator of based on can be calculated.
Properties
Admissibility
Bayes rules having finite Bayes risk are typically admissible. The following are some specific examples of admissibility theorems.- If a Bayes rule is unique then it is admissible. For example, as stated above, under mean squared error the Bayes rule is unique and therefore admissible.
- If θ belongs to a discrete set, then all Bayes rules are admissible.
- If θ belongs to a continuous set, and if the risk function R is continuous in θ for every δ, then all Bayes rules are admissible.
Asymptotic efficiency
Let θ be an unknown random variable, and suppose that are iid samples with density. Let be a sequence of Bayes estimators of θ based on an increasing number of measurements. We are interested in analyzing the asymptotic performance of this sequence of estimators, i.e., the performance of for large n.To this end, it is customary to regard θ as a deterministic parameter whose true value is. Under specific conditions, for large samples, the posterior density of θ is approximately normal. In other words, for large n, the effect of the prior probability on the posterior is negligible. Moreover, if δ is the Bayes estimator under MSE risk, then it is asymptotically unbiased and it converges in distribution to the normal distribution:
where I is the Fisher information of θ0.
It follows that the Bayes estimator δn under MSE is asymptotically efficient.
Another estimator which is asymptotically normal and efficient is the maximum likelihood estimator. The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example.