Confidence interval
According to frequentist inference, a confidence interval is a range of values which is likely to contain the true value of an unknown statistical parameter, such as a population mean. Rather than reporting a single point estimate, a confidence interval provides a range, such as 2 to 4 hours, along with a specified confidence level, typically 95%.
A 95% confidence level does not imply a 95% probability that the true parameter lies within a particular calculated interval, which is instead associated with the credible interval in bayesian inference. The confidence level instead reflects the long-run reliability of the method used to generate the interval. In other words, if the same sampling procedure were repeated 100 times from the same population, approximately 95 of the resulting intervals would be expected to contain the true population mean. The frequentist approach sees the true population mean as a fixed unknown constant, while the confidence interval is calculated using data from a random sample. Because the sample is random, the interval endpoints are random variables.
Definition
Let be a random sample from a probability distribution with statistical parameter. Here, is the quantity to be estimated, while includes other parameters that determine the distribution. A confidence interval for the parameter, with confidence level or coefficient, is an interval determined by random variables and with the property:The number, which is typically large, is sometimes given in the form , where is a small positive number, often 0.05. It means that the interval has a probability of covering the value of in repeated sampling.
In many applications, confidence intervals that have exactly the required confidence level are hard to construct, but approximate intervals can be computed. The rule for constructing the interval may be accepted if
to an acceptable level of approximation. Alternatively, some authors simply require that
When it is known that the coverage probability can be strictly larger than for some parameter values, the confidence interval is called conservative, i.e., it errs on the safe side; which also means that the interval can be wider than need be.
Methods of derivation
There are many ways of calculating confidence intervals, and the best method depends on the situation. Two widely applicable methods are bootstrapping and the central limit theorem. The latter method works only if the sample is large, since it entails calculating the sample mean and sample standard deviation and using the asymptotically standard normal quantitywhere and are the population mean and the sample size, respectively.
Example
Suppose is an independent sample from a normally distributed population with unknown parameters mean and variance Define the sample mean and unbiased sample variance asThen the value
has a Student's t distribution with degrees of freedom. This value is useful because its distribution does not depend on the values of the unobservable parameters and ; i.e., it is a pivotal quantity.
Suppose we wanted to calculate a 95% confidence interval for First, let be the 97.5th percentile of the distribution of. Then there is a 2.5% chance that will be less than and a 2.5% chance that it will be larger than . In other words,
Consequently, by replacing with and re-arranging terms,
where is the probability measure for the sample.
It means that there is 95% probability with which this condition occurs in repeated sampling. After observing a sample, we find values for and for from which we compute the below interval, and we say it is a 95% confidence interval for the mean.
Interpretation
Various interpretations of a confidence interval can be given.- The confidence interval can be expressed in terms of a long-run frequency in repeated samples : "Were this procedure to be repeated on numerous samples, the proportion of calculated 95% confidence intervals that encompassed the true value of the population parameter would tend toward 95%."
- The confidence interval can be expressed in terms of probability with respect to a single theoretical sample: "There is a 95% probability that the 95% confidence interval calculated from a given future sample will cover the true value of the population parameter." This essentially reframes the "repeated samples" interpretation as a probability rather than a frequency.
- The confidence interval can be expressed in terms of statistical significance, e.g.: "The 95% confidence interval represents values that are not statistically significantly different from the point estimate at the.05 level."
Common misunderstandings
Contrary to common misconceptions, a 95% confidence level does not mean that:
- for a given realized interval there is a 95% probability that the population parameter lies within the interval;
- 95% of the sample data lie within the confidence interval; or
- there is a 95% probability of the parameter estimate from a repeat of the experiment falling within the confidence interval computed from a given experiment.
- It is incorrect to say that there is a 95% probability that the true population mean lies within this interval: the true mean is fixed, not random. The true mean could be 37 mm, which is within the confidence interval, or 40 mm, which is not; in any case, whether it falls between 36.8 and 39.0 mm is a matter of fact, not probability.
- It is not necessarily true that the lengths of 95% of the sampled rods lie within this interval. In this case, it cannot be true: 95% of 25 is not an integer.
- It is not generally true that there is a 95% probability that the sample mean length in a second sample would fall within this interval. In fact, if the true mean length is far from this specific confidence interval, it could be very unlikely that the next sample mean falls within the interval.
Comparison with [prediction interval]s
A confidence interval is used to estimate a population parameter, such as the mean. For example, the expected value of a fair six-sided die is 3.5. Based on repeated sampling, after computing many 95% confidence intervals, roughly 95% of them will contain 3.5.A prediction interval, on the other hand, provides a range within which a future individual observation is expected to fall with a certain probability. In the case of a single roll of a fair six-sided die, an exact 95% prediction interval does not exist. However, there are exact 95% prediction intervals for rolling a twenty-sided die. One such interval is, since 95% of the time the roll will result in a 19 or less, and the remaining 5% will result in a 20.
The key distinction is that confidence intervals quantify uncertainty in estimating parameters, while prediction intervals quantify uncertainty in forecasting future observations.
Comparison with [credible interval]s
In many common settings, such as estimating the mean of a normal distribution with known variance, confidence intervals coincide with credible intervals under non-informative priors. In such cases, common misconceptions about confidence intervals may yield practically correct conclusions.Examples of how naïve interpretation of confidence intervals can be problematic
Confidence procedure for uniform location
Welch presented an example which clearly shows the difference between the theory of confidence intervals and other theories of interval estimation. Robinson called this example "ossibly the best known counterexample for Neyman's version of confidence interval theory." To Welch, it showed the superiority of confidence interval theory; to critics of the theory, it shows a deficiency. Here we present a simplified version.Suppose that are independent observations from a uniform distribution. Then the optimal 50% confidence procedure for is
A fiducial or objective Bayesian argument can be used to derive the interval estimate
which is also a 50% confidence procedure. Welch showed that the first confidence procedure dominates the second, according to desiderata from confidence interval theory; for every, the probability that the first procedure contains is less than or equal to the probability that the second procedure contains. The average width of the intervals from the first procedure is less than that of the second. Hence, the first procedure is preferred under classical confidence interval theory.
However, when, intervals from the first procedure are guaranteed to contain the true value : Therefore, the nominal 50% confidence coefficient is unrelated to the uncertainty we should have that a specific interval contains the true value. The second procedure does not have this property.
Moreover, when the first procedure generates a very short interval, this indicates that are very close together and hence only offer the information in a single data point. Yet the first interval will exclude almost all reasonable values of the parameter due to its short width. The second procedure does not have this property.
The two counter-intuitive properties of the first procedure – 100% coverage when are far apart and almost 0% coverage when are close together – balance out to yield 50% coverage on average. However, despite the first procedure being optimal, its intervals offer neither an assessment of the precision of the estimate nor an assessment of the uncertainty one should have that the interval contains the true value.
This example is used to argue against naïve interpretations of confidence intervals. If a confidence procedure is asserted to have properties beyond that of the nominal coverage, those properties must be proved; they do not follow from the fact that a procedure is a confidence procedure.