Base rate fallacy
The base rate fallacy, also called base rate neglect or base rate bias, is a type of fallacy in which people tend to ignore the base rate in favor of the information pertaining only to a specific case. Base rate neglect is a specific form of the more general extension neglect.
It is also called the prosecutor's fallacy or defense attorney's fallacy when applied to the results of statistical tests in the context of law proceedings. These terms were introduced by William C. Thompson and Edward Schumann in 1987, although it has been argued that their definition of the prosecutor's fallacy extends to many additional invalid imputations of guilt or liability that are not analyzable as errors in base rates or Bayes's theorem.
False positive paradox
An example of the base rate fallacy is the false positive paradox. This paradox describes situations where there are more false positive test results than true positives. For example, if a facial recognition camera can identify wanted criminals 99% accurately, but analyzes 10,000 people a day, the high accuracy is outweighed by the number of tests; because of this, the program's list of criminals will likely have far more innocents than criminals because there are far more innocents than criminals overall. The probability of a positive test result is determined not only by the accuracy of the test but also by the characteristics of the sampled population. The fundamental issue is that the far higher prevalence of true negatives means that the pool of people testing positively will be dominated by false positives, given that even a small fraction of the much larger group will produce a larger number of indicated positives than the larger fraction of the much smaller group.When the prevalence, the proportion of those who have a given condition, is lower than the test's false positive rate, even tests that have a very low risk of giving a false positive in an individual case will give more false than true positives overall.
It is especially counter-intuitive when interpreting a positive result in a test on a low-prevalence population after having dealt with positive results drawn from a high-prevalence population. If the false positive rate of the test is higher than the proportion of the new population with the condition, then a test administrator whose experience has been drawn from testing in a high-prevalence population may conclude from experience that a positive test result usually indicates a positive subject, when in fact a false positive is far more likely to have occurred.
Examples
Example 1: Disease
High-prevalence population
Imagine running an infectious disease test on a population A of 1,000 persons, of which 40% are infected. The test has a false positive rate of 5% and a false negative rate of zero. The expected outcome of the 1,000 tests on population A would be:So, in population A, a person receiving a positive test could be over 93% confident that it correctly indicates infection.
Low-prevalence population
Now consider the same test applied to population B, of which only 2% are infected. The expected outcome of 1000 tests on population B would be:In population B, only 20 of the 69 total people with a positive test result are actually infected. So, the probability of actually being infected after one is told that one is infected is only 29% for a test that otherwise appears to be "95% accurate".
A tester with experience of group A might find it a paradox that in group B, a result that had usually correctly indicated infection is now usually a false positive. The confusion of the posterior probability of infection with the prior probability of receiving a false positive is a natural error after receiving a health-threatening test result.
Example 2: Drunk drivers
Imagine that a group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random to administer a breathalyzer test. It indicates that the driver is drunk. No other information is known about them.Many would estimate the probability that the driver is drunk as high as 95%, but the correct probability is about 2%.
An explanation for this is as follows: on average, for every 1,000 drivers tested,
- 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result
- 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results
The validity of this result does, however, hinge on the validity of the initial assumption that the police officer stopped the driver truly at random, and not because of bad driving. If that or another non-arbitrary reason for stopping the driver was present, then the calculation also involves the probability of a drunk driver driving competently and a non-drunk driver driving competently.
More formally, the same probability of roughly 0.02 can be established using Bayes' theorem. The goal is to find the probability that the driver is drunk given that the breathalyzer indicated they are drunk, which can be represented as
where D means that the breathalyzer indicates that the driver is drunk. Using Bayes's theorem,
The following information is known in this scenario:
As can be seen from the formula, one needs p for Bayes' theorem, which can be computed from the preceding values using the law of total probability:
which gives
Plugging these numbers into Bayes' theorem, one finds that
which is the precision of the test.
Example 3: Terrorist identification
In a city of 1 million inhabitants, let there be 100 terrorists and 999,900 non-terrorists. To simplify the example, it is assumed that all people present in the city are inhabitants. Thus, the base rate probability of a randomly selected inhabitant of the city being a terrorist is 0.0001, and the base rate probability of that same inhabitant being a non-terrorist is 0.9999. In an attempt to catch the terrorists, the city installs an alarm system with a surveillance camera and automatic facial recognition software.The software has two failure rates of 1%:
- The false negative rate: If the camera scans a terrorist, a bell will ring 99% of the time, and it will fail to ring 1% of the time.
- The false positive rate: If the camera scans a non-terrorist, a bell will not ring 99% of the time, but it will ring 1% of the time.
The fallacy arises from confusing the natures of two different failure rates. The 'number of non-bells per 100 terrorists' and the 'number of non-terrorists per 100 bells' are unrelated quantities; one is not necessarily equal—or even close—to the other. To show this, consider what happens if an identical alarm system were set up in a second city with no terrorists at all. As in the first city, the alarm sounds for 1 out of every 100 non-terrorist inhabitants detected, but unlike in the first city, the alarm never sounds for a terrorist. Therefore, 100% of all occasions of the alarm sounding are for non-terrorists, but a false negative rate cannot even be calculated. The 'number of non-terrorists per 100 bells' in that city is 100, yet P = 0%. There is zero chance that a terrorist has been detected given the ringing of the bell.
Imagine that the first city's entire population of one million people pass in front of the camera. About 99 of the 100 terrorists will trigger the alarm—and so will about 9,999 of the 999,900 non-terrorists. Therefore, about 10,098 people will trigger the alarm, among which about 99 will be terrorists. The probability that a person triggering the alarm actually is a terrorist is only about 99 in 10,098, which is less than 1% and very, very far below the initial guess of 99%.
The base rate fallacy is so misleading in this example because there are many more non-terrorists than terrorists, and the number of false positives is so much larger than the true positives.
Multiple practitioners have argued that as the base rate of terrorism is extremely low, using data mining and predictive algorithms to identify terrorists cannot feasibly work due to the false positive paradox. Estimates of the number of false positives for each accurate result vary from over ten thousand to one billion; consequently, investigating each lead would be cost- and time-prohibitive. The level of accuracy required to make these models viable is likely unachievable. Foremost, the low base rate of terrorism also means there is a lack of data with which to make an accurate algorithm. Further, in the context of detecting terrorism false negatives are highly undesirable and thus must be minimised as much as possible; however, this requires increasing sensitivity at the cost of specificity, increasing false positives. It is also questionable whether the use of such models by law enforcement would meet the requisite burden of proof given that over 99% of results would be false positives.
A distinct mechanism amplifies this effect in multi-attribute screening. While matching 15 specific pre-determined attributes has probability 10−35, systems that flag individuals matching any 15 of 1,000 attributes have per-person false alert probabilities around 10−4—a difference of 31 orders of magnitude arising from the combinatorics of threshold rules, not low base rates alone. In a city of one million, this produces approximately 226 false alerts; the probability of zero false alerts is roughly 10−99. Such systems exhibit sharp phase transitions at critical population sizes, beyond which failure becomes certain and cannot be prevented through threshold adjustment.