Odds ratio
An odds ratio is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B. Due to [|symmetry], odds ratio reciprocally calculates the ratio of the odds of B occurring in the presence of A, and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event occurring.
Note that the odds ratio is symmetric in the two events, and no causal direction is implied : an OR greater than 1 does not establish that B causes A, or that A causes B.
Two similar statistics that are often used to quantify associations are the relative risk and the absolute risk reduction. Often, the parameter of greatest interest is actually the RR, which is the ratio of the probabilities analogous to the odds used in the OR. However, available data frequently do not allow for the computation of the RR or the ARR, but do allow for the computation of the OR, as in case-control studies, as explained below. On the other hand, if one of the properties is sufficiently rare, then the OR is approximately equal to the corresponding RR.
The OR [|plays an important role] in the logistic model.
Definition and basic properties
Intuition from an example for laypeople
If we flip an unbiased coin, the probability of getting heads and the probability of getting tails are equal — both are 50%. Imagine we get a biased coin such that, if one flips it, one is twice as likely to get heads than tails. The new probabilities would be 66.666...% for heads and 33.333...% for tails.A motivating example, in the context of the [rare disease assumption]
Suppose a radiation leak in a village of 1,000 people increased the incidence of a rare disease. The total number of people exposed to the radiation was out of which developed the disease and stayed healthy. The total number of people not exposed was out of which developed the disease and stayed healthy. We can organize this in a contingency table:The risk of developing the disease given exposure is and of developing the disease given non-exposure is. One obvious way to compare the risks is to use the ratio of the two, the relative risk.
The odds ratio is different. The odds of getting the disease if exposed is and the odds if not exposed is The odds ratio is the ratio of the two,
As illustrated by this example, in a rare-disease case like this, the relative risk and the odds ratio are almost the same. By definition, rare disease implies that and. Thus, the denominators in the relative risk and odds ratio are almost the same :
The odds in this sample of getting the disease given that someone is exposed is 20/10 and the odds given that someone is not exposed is 6/16. The odds ratio is thus, quite close to the odds ratio calculated for the entire village. The relative risk, however, cannot be calculated, because it is the ratio of the risks of getting the disease and we would need and to figure those out. Because the study selected for people with the disease, half the people in the sample have the disease and it is known that that is more than the population-wide prevalence. Therefore, the numbers of diseased and healthy within exposed, and diseased and healthy within non-exposed, are not addable.
It is standard in the medical literature to calculate the odds ratio and then use the rare-disease assumption to claim that the relative risk is approximately equal to it. This not only allows for the use of case-control studies, but makes controlling for confounding variables such as weight or age using regression analysis easier and has the desirable properties discussed in other sections of this article of [|invariance] and [|insensitivity to the type of sampling].
Definition in terms of group-wise odds
The odds ratio is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. The term is also used to refer to sample-based estimates of this ratio. These groups might be men and women, an experimental group and a control group, or any other dichotomous classification. If the probabilities of the event in each of the groups are p1 and p2, then the odds ratio is:where qx = 1 − px. An odds ratio of 1 indicates that the condition or event under study is equally likely to occur in both groups. An odds ratio greater than 1 indicates that the condition or event is more likely to occur in the first group. And an odds ratio less than 1 indicates that the condition or event is less likely to occur in the first group. The odds ratio must be nonnegative if it is defined. It is undefined if p2q1 equals zero, i.e., if p2 equals zero or q1 equals zero.
Definition in terms of joint and conditional probabilities
The odds ratio can also be defined in terms of the joint probability distribution of two binary random variables. The joint distribution of binary random variables and can be writtenwhere 11, 10, 01 and 00 are non-negative "cell probabilities" that sum to one. The odds for within the two subpopulations defined by = 1 and = 0 are defined in terms of the conditional probabilities given, i.e., :
Thus, the odds ratio is:
Note that the odds ratio is also the product of the probabilities of the "concordant cells" divided by the product of the probabilities of the "discordant cells". However, in some applications the labelling of categories as zero and one is arbitrary, so there is nothing special about concordant versus discordant values in these applications.
Symmetry
If we had calculated the odds ratio based on the conditional probabilities given Y,we would have obtained the same result
Other measures of effect size for binary data such as the relative risk do not have this symmetry property.
Relation to statistical independence
If X and Y are independent, their joint probabilities can be expressed in terms of their marginal probabilities and, as followsIn this case, the odds ratio equals one, and conversely the odds ratio can only equal one if the joint probabilities can be factored in this way. Thus the odds ratio equals one if and only if X and Y are independent.
Recovering the cell probabilities from the odds ratio and marginal probabilities
The odds ratio is a function of the cell probabilities, and conversely, the cell probabilities can be recovered given knowledge of the odds ratio and the marginal probabilities and. If the odds ratio R differs from 1, thenwhere, and
In the case where, we have independence, so.
Once we have, the other three cell probabilities can easily be recovered from the marginal probabilities.
Example
Suppose that in a sample of 100 men, 90 drank wine in the previous week, while in a sample of 80 women only 20 drank wine in the same period. This forms the contingency table:The odds ratio can be directly calculated from this table as:
Alternatively, the odds of a man drinking wine are 90 to 10, or 9:1, while the odds of a woman drinking wine are only 20 to 60, or 1:3 = 0.33. The odds ratio is thus 9/0.33, or 27, showing that men are much more likely to drink wine than women. The detailed calculation is:
This example also shows how odds ratios are sometimes sensitive in stating relative positions: in this sample men are / = 3.6 times as likely to have drunk wine than women, but have 27 times the odds. The logarithm of the odds ratio, the difference of the logits of the probabilities, tempers this effect, and also makes the measure symmetric with respect to the ordering of groups. For example, using natural logarithms, an odds ratio of 27/1 maps to 3.296, and an odds ratio of 1/27 maps to −3.296.
Statistical inference
Several approaches to statistical inference for odds ratios have been developed.One approach to inference uses large sample approximations to the sampling distribution of the log odds ratio. If we use the joint probability notation defined above, the population log odds ratio is
If we observe data in the form of a contingency table
then the probabilities in the joint distribution can be estimated as
where, with being the sum of all four cell counts. The sample log odds ratio is
The distribution of the log odds ratio is approximately normal with:
The standard error for the log odds ratio is approximately
This is an asymptotic approximation, and will not give a meaningful result if any of the cell counts are very small. If L is the sample log odds ratio, an approximate 95% confidence interval for the population log odds ratio is. This can be mapped to to obtain a 95% confidence interval for the odds ratio. If we wish to test the hypothesis that the population odds ratio equals one, the two-sided p-value is, where P denotes a probability, and Z denotes a standard normal random variable.
An alternative approach to inference for odds ratios looks at the distribution of the data conditionally on the marginal frequencies of X and Y. An advantage of this approach is that the sampling distribution of the odds ratio can be expressed exactly.