Correlation does not imply causation
The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.
As with any logical fallacy, identifying that the reasoning behind an argument is flawed does not necessarily imply that the resulting conclusion is false. Statistical methods have been proposed that use correlation as the basis for hypothesis tests for causality, including the Granger causality test and convergent cross mapping. The Bradford Hill criteria, also known as Hill's criteria for causation, are a group of nine principles that can be useful in considering the epidemiologic evidence of a causal relationship. Ultimately, assumptions are always required to draw causal conclusions, and modern causal inference frameworks focus on interrogating the strength of these assumptions.
Usage and meaning of terms
"Imply"
In casual use, the word "implies" loosely means suggests, rather than requires. However, in logic, the technical use of the word "implies" means "is a sufficient condition for." That is the meaning intended by statisticians when they say causation is not certain. Indeed, p implies q has the technical meaning of the material conditional: if p then q symbolized as p → q. That is, "if circumstance p is true, then q follows." In that sense, it is always correct to say "Correlation does not imply causation.""Cause"
The word "cause" has multiple meanings in English. In philosophical terminology, "cause" can refer to necessary, sufficient, or contributing causes. In examining correlation, "cause" is most often used to mean "one contributing cause".Causal analysis
Causal inference
Examples of illogically inferring causation from correlation
B causes A (reverse causation or reverse causality)
Reverse causation or reverse causality or wrong direction is an informal fallacy of questionable cause where cause and effect are reversed. The cause is said to be the effect and vice versa.;Example 1
In this example, the correlation between windmill activity and wind velocity does not imply that wind is caused by windmills. It is rather the other way around, as suggested by the fact that wind does not need windmills to exist, while windmills need wind to rotate. Wind can be observed in places where there are no windmills or non-rotating windmills—and there are good reasons to believe that wind existed before the invention of windmills.
;Example 2
Causality is actually the other way around, since some diseases, such as cancer, cause low cholesterol due to a myriad of factors, such as weight loss, and they also cause an increase in mortality. This can also be seen in alcoholics. As alcoholics become diagnosed with cirrhosis of the liver, many quit drinking. However, they also experience an increased risk of mortality. In these instances, it is the diseases that cause an increased risk of mortality, but the increased mortality is attributed to the beneficial effects that follow the diagnosis, making healthy changes look unhealthy.
Example 3
In other cases it may simply be unclear which is the cause and which is the effect. For example:
This could easily be the other way round; that is, violent children like watching more TV than less violent ones.
Example 4
A correlation between recreational drug use and psychiatric disorders might be either way around: perhaps the drugs cause the disorders, or perhaps people use drugs to self medicate for preexisting conditions. Gateway drug theory may argue that marijuana usage leads to usage of harder drugs, but hard drug usage may lead to marijuana usage. Indeed, in the social sciences where controlled experiments often cannot be used to discern the direction of causation, this fallacy can fuel long-standing scientific arguments. One such example can be found in education economics, between the screening/signaling and human capital models: it could either be that having innate ability enables one to complete an education, or that completing an education builds one's ability.
Example 5
A historical example of this is that Europeans in the Middle Ages believed that lice were beneficial to health since there would rarely be any lice on sick people. The reasoning was that the people got sick because the lice left. The real reason however is that lice are extremely sensitive to body temperature. A small increase of body temperature, such as in a fever, makes the lice look for another host. The medical thermometer had not yet been invented and so that increase in temperature was rarely noticed. Noticeable symptoms came later, which gave the impression that the lice had left before the person became sick.
In other cases, two phenomena can each be a partial cause of the other; consider poverty and lack of education, or procrastination and poor self-esteem. One making an argument based on these two phenomena must however be careful to avoid the fallacy of circular cause and consequence. Poverty is a cause of lack of education, but it is not the sole cause, and vice versa.
Third factor C (the common-causal variable) causes both A and B
The third-cause fallacy is a logical fallacy in which a spurious relationship is confused for causation. It asserts that X causes Y when in reality, both X and Y are caused by Z. It is a variation on the post hoc ergo propter hoc fallacy and a member of the questionable cause group of fallacies.All of those examples deal with a lurking variable, which is simply a hidden third variable that affects both of the variables observed to be correlated. That third variable is also known as a confounding variable, with the slight difference that confounding variables need not be hidden and may thus be corrected for in an analysis. Note that the Wikipedia link to lurking variable redirects to confounding. A difficulty often also arises where the third factor, though fundamentally different from A and B, is so closely related to A and/or B as to be confused with them or very difficult to scientifically disentangle from them.
;Example 1
The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case going to bed drunk, which thereby gives rise to a correlation. So the conclusion is false.
;Example 2
This is a scientific example that resulted from a study at the University of Pennsylvania Medical Center. Published in the May 13, 1999, issue of Nature, the study received much coverage at the time in the popular press. However, a later study at Ohio State University did not find that infants sleeping with the light on caused the development of myopia. It did find a strong link between parental myopia and the development of child myopia, also noting that myopic parents were more likely to leave a light on in their children's bedroom. In this case, the cause of both conditions is parental myopia, and the above-stated conclusion is false.
;Example 3
This example fails to recognize the importance of time of year and temperature to ice cream sales. Ice cream is sold during the hot summer months at a much greater rate than during colder times, and it is during these hot summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is false.
;Example 4
However, as encountered in many psychological studies, another variable, a "self-consciousness score", is discovered that has a sharper correlation with shyness. This suggests a possible "third variable" problem, however, when three such closely related measures are found, it further suggests that each may have bidirectional tendencies, being a cluster of correlated values each influencing one another to some extent. Therefore, the simple conclusion above may be false.
;Example 5
Richer populations tend to eat more food and produce more CO2.
;Example 6
Further research has called this conclusion into question. Instead, it may be that other underlying factors, like genes, diet and exercise, affect both HDL levels and the likelihood of having a heart attack; it is possible that medicines may affect the directly measurable factor, HDL levels, without affecting the chance of heart attack.
Bidirectional causation: A causes B, and B causes A
Causality is not necessarily one-way;in a predator-prey relationship, predator numbers affect prey numbers, but prey numbers, i.e. food supply, also affect predator numbers. Another well-known example is that cyclists have a lower Body Mass Index than people who do not cycle. This is often explained by assuming that cycling increases physical activity levels and therefore decreases BMI. Because results from prospective studies on people who increase their bicycle use show a smaller effect on BMI than cross-sectional studies, there may be some reverse causality as well. For example, people with a lower BMI may be more likely to want to cycle in the first place.