Type I and type II errors
Type I error, or a false positive, is the incorrect rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the incorrect failure to reject a false null hypothesis.
An analysis commits a Type I error when some baseline assumption is incorrectly rejected because of new, misleading information. Meanwhile, a Type II error is made when such an assumption is maintained, due to flawed or insufficent data, when better measurements would have shown it to be untrue. For example, in the context of medical testing, if we consider the null hypothesis to be "This patient does not have the disease," a diagnosis that the disease is present when it is not is a Type I error, while a diagnosis that the patient does not have the disease when it is present would be a Type II error. The manner in which a null hypothesis frames contextually default expectations influences the specific ways in which type I errors and type II errors manifest, and this varies by context and application. Generally the risk of such errors cannot be entirely eliminated, only traded-off between the two types, by for example changing the significance threshold.
Knowledge of type I errors and type II errors is applied widely in fields of medical science, biometrics and computer science. Minimising these errors is an object of study within statistical theory, though complete elimination of either is impossible when relevant outcomes are not determined by known, observable, causal processes.
Definition
Statistical background
In statistical test theory, the notion of a statistical error is an integral part of hypothesis testing. The test goes about choosing about two competing propositions called null hypothesis, denoted by and alternative hypothesis, denoted by. This is conceptually similar to the judgement in a court trial. The null hypothesis corresponds to the position of the defendant: just as he is presumed to be innocent until proven guilty, so is the null hypothesis presumed to be true until the data provide convincing evidence against it. The alternative hypothesis corresponds to the position against the defendant. Specifically, the null hypothesis also involves the absence of a difference or the absence of an association. Thus, the null hypothesis can never be that there is a difference or an association.If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject. On the other hand, the alternative hypothesis may be true, whereas we do not reject. Two types of error are distinguished: type I error and type II error.
Type I error
The first kind of error is the mistaken rejection of a null hypothesis as the result of a test procedure. This kind of error is called a type I error and is sometimes called an error of the first kind. In terms of the courtroom example, a type I error corresponds to convicting an innocent defendant.Type II error
The second kind of error is the mistaken failure to reject the null hypothesis as the result of a test procedure. This sort of error is called a type II error and is also referred to as an error of the second kind. In terms of the courtroom example, a type II error corresponds to acquitting a criminal.Crossover error rate
The crossover error rate is the point at which type I errors and type II errors are equal. A system with a lower CER value provides more accuracy than a system with a higher CER value. With all else being equal, having the rate of type I errors and type II errors being equal will result in the lowest overall error rate.False positive and false negative
In terms of false positives and false negatives, a positive result corresponds to rejecting the null hypothesis, while a negative result corresponds to failing to reject the null hypothesis; "false" means the conclusion drawn is incorrect. Thus, a type I error is equivalent to a false positive, and a type II error is equivalent to a false negative.Table of error types
Tabulated relations between truth/falseness of the null hypothesis and outcomes of the test:Error rate
A perfect test would have zero false positives and zero false negatives. However, statistical methods are probabilistic, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this, all statistical hypothesis tests have a probability of making type I and type II errors.- The type I error rate is the probability of rejecting the null hypothesis given that it is true. The test is designed to keep the type I error rate below a prespecified bound called the significance level, usually denoted by the Greek letter α and is also called the alpha level. Usually, the significance level is set to 0.05, implying that it is acceptable to have a 5% probability of incorrectly rejecting the true null hypothesis.
- The rate of the type II error is denoted by the Greek letter β and related to the power of a test, which equals 1−β.
The quality of hypothesis test
The same idea can be expressed in terms of the rate of correct results and therefore used to minimize error rates and improve the quality of hypothesis test. To reduce the probability of committing a type I error, making the alpha value more stringent is both simple and efficient. For example, setting the alpha value at 0.01, instead of 0.05. To decrease the probability of committing a type II error, which is closely associated with analyses' power, either increasing the test's sample size or relaxing the alpha level, ex. setting the alpha level to 0.1 instead of 0.05, could increase the analyses' power. A test statistic is robust if the type I error rate is controlled.Varying different threshold values could also be used to make the test either more specific or more sensitive, which in turn elevates the test quality. For example, imagine a medical test, in which an experimenter might measure the concentration of a certain protein in the blood sample. The experimenter could adjust the threshold and people would be diagnosed as having diseases if any number is detected above this certain threshold. According to the image, changing the threshold would result in changes in false positives and false negatives, corresponding to movement on the curve.
Example
Since in a real experiment it is impossible to avoid all type I and type II errors, it is important to consider the amount of risk one is willing to take to falsely reject H0 or accept H0. The solution to this question would be to report the p-value or significance level α of the statistic. For example, if the p-value of a test statistic result is 0.0596, then there is a probability of 5.96% that we falsely reject H0 given it is true. Or, if we say, the statistic is performed at level α, like 0.05, then we allow to falsely reject H0 at 5%. A significance level α of 0.05 is relatively common, but there is no general rule that fits all scenarios.Vehicle speed measuring
The speed limit of a freeway in the United States is 120 kilometers per hour. A device is set to measure the speed of passing vehicles. Suppose that the device will conduct three measurements of the speed of a passing vehicle, recording as a random sample X1, X2, X3. The traffic police will or will not fine the drivers depending on the average speed. That is to say, the test statisticIn addition, we suppose that the measurements X1, X2, X3 are modeled as normal distribution N. Then, T should follow N and the parameter μ represents the true speed of passing vehicle. In this experiment, the null hypothesis H0 and the alternative hypothesis H1 should be
H0: μ=120 against H1: μ>120.
If we perform the statistic level at α=0.05, then a critical value c should be calculated to solve
According to change-of-units rule for the normal distribution. Referring to Z-table, we can get
Here, the critical region. That is to say, if the recorded speed of a vehicle is greater than critical value 121.9, the driver will be fined. However, there are still 5% of the drivers are falsely fined since the recorded average speed is greater than 121.9 but the true speed does not pass 120, which we say, a type I error.
The type II error corresponds to the case that the true speed of a vehicle is over 120 kilometers per hour but the driver is not fined. For example, if the true speed of a vehicle μ=125, the probability that the driver is not fined can be calculated as
which means, if the true speed of a vehicle is 125, the driver has the probability of 0.36% to avoid the fine when the statistic is performed at level α=0.05, since the recorded average speed is lower than 121.9. If the true speed is closer to 121.9 than 125, then the probability of avoiding the fine will also be higher.
The tradeoffs between type I error and type II error should also be considered. That is, in this case, if the traffic police do not want to falsely fine innocent drivers, the level α can be set to a smaller value, like 0.01. However, if that is the case, more drivers whose true speed is over 120 kilometers per hour, like 125, would be more likely to avoid the fine.
Etymology
In 1928, Jerzy Neyman and Egon Pearson, both eminent statisticians, discussed the problems associated with "deciding whether or not a particular sample may be judged as likely to have been randomly drawn from a certain population": and, as Florence Nightingale David remarked, "it is necessary to remember the adjective 'random' should apply to the method of drawing the sample and not to the sample itself".They identified "two sources of error", namely:
In 1930, they elaborated on these two sources of error, remarking that
In 1933, they observed that these "problems are rarely presented in such a form that we can discriminate with certainty between the true and false hypothesis". They also noted that, in deciding whether to fail to reject, or reject a particular hypothesis amongst a "set of alternative hypotheses", H1, H2..., it was easy to make an error,
In all of the papers co-written by Neyman and Pearson the expression H0 always signifies "the hypothesis to be tested".
In the same paper they call these two sources of error, errors of type I and errors of type II respectively.