Regression toward the mean
In statistics, regression toward the mean is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. Furthermore, when many random variables are sampled and the most extreme results are intentionally picked out, it refers to the fact that a second sampling of these picked-out variables will result in "less extreme" results, closer to the initial mean of all of the variables.
Mathematically, the strength of this "regression" effect is dependent on whether or not all of the random variables are drawn from the same distribution, or if there are genuine differences in the underlying distributions for each random variable. In the first case, the "regression" effect is statistically likely to occur, but in the second case, it may occur less strongly or not at all.
Regression toward the mean is thus a useful concept to consider when designing any scientific experiment, data analysis, or test, which intentionally selects the most extreme events - it indicates that follow-up checks may be useful in order to avoid jumping to false conclusions about these events; they may be genuine extreme events, a completely meaningless selection due to statistical noise, or a mix of the two cases.
Conceptual examples
Simple example: students taking a test
Consider a class of students taking a 100-item true/false test on a subject. Suppose that all students choose randomly on all questions. Then, each student's score would be a realization of one of a set of independent and identically distributed random variables, with an expected mean of 50. Naturally, some students will score substantially above 50 and some substantially below 50 just by chance. If one selects only the top scoring 10% of the students and gives them a second test on which they again choose randomly on all items, the mean score would again be expected to be close to 50. Thus the mean of these students would "regress" all the way back to the mean of all students who took the original test. No matter what a student scores on the original test, the best prediction of their score on the second test is 50.If choosing answers to the test questions was not random – i.e. if there were no luck or random guessing involved in the answers supplied by the students – then all students would be expected to score the same on the second test as they scored on the original test, and there would be no regression toward the mean.
Most realistic situations fall between these two extremes: for example, one might consider exam scores as a combination of skill and luck. In this case, the subset of students scoring above average would be composed of those who were skilled and had not especially bad luck, together with those who were unskilled, but were extremely lucky. On a retest of this subset, the unskilled will be unlikely to repeat their lucky break, while the skilled will have a second chance to have bad luck. Hence, those who did well previously are unlikely to do quite as well in the second test even if the original cannot be replicated.
The following is an example of this second kind of regression toward the mean. A class of students takes two editions of the same test on two successive days. It has frequently been observed that the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because student scores are determined in part by underlying ability and in part by chance. For the first test, some will be lucky, and score more than their ability, and some will be unlucky and score less than their ability. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have average or below average scores. Therefore, a student who was lucky and over-performed their ability on the first test is more likely to have a worse score on the second test than a better score. Similarly, students who unluckily score less than their ability on the first test will tend to see their scores increase on the second test. The larger the influence of luck in producing an extreme event, the less likely the luck will repeat itself in multiple events.
Other examples
If your favourite sports team won the championship last year, what does that mean for their chances for winning next season? To the extent this result is due to skill, their win signals that it is more likely they will win again next year. But the greater the extent this is due to luck, the less likely it is they will win again next year.If a business organisation has a highly profitable quarter, despite the underlying reasons for its performance being unchanged, it is likely to do less well the next quarter.
Baseball players who hit well in their rookie season are likely to do worse their second; the "sophomore slump". Similarly, regression toward the mean is an explanation for the Sports Illustrated cover jinx — periods of exceptional performance which results in a cover feature are likely to be followed by periods of more mediocre performance, giving the impression that appearing on the cover causes an athlete's decline.
History
Discovery
The concept of regression comes from genetics and was popularized by Sir Francis Galton during the late 19th century with the publication of Regression towards mediocrity in hereditary stature. Galton observed that extreme characteristics in parents are not passed on completely to their offspring. Rather, the characteristics in the offspring regress toward a mediocre point. By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, "the average regression of the offspring is a constant fraction of their respective mid-parental deviations". This means that the difference between a child and its parents for some characteristic is proportional to its parents' deviation from typical people in the population. If its parents are each two inches taller than the averages for men and women, then, on average, the offspring will be shorter than its parents by some factor times two inches. For height, Galton estimated this coefficient to be about 2/3: the height of an individual will measure around a midpoint that is two thirds of the parents' deviation from the population average.Galton also published these results using the simpler example of pellets falling through a Galton board to form a normal distribution centred directly under their entrance point. These pellets might then be released down into a second gallery corresponding to a second measurement. Galton then asked the reverse question: "From where did these pellets come?"
The answer was not on average directly above. Rather it was on average, more towards the middle, for the simple reason that there were more pellets above it towards the middle that could wander left than there were in the left extreme that could wander to the right, inwards.
Evolving usage of the term
Galton coined the term "regression" to describe an observable fact in the inheritance of multi-factorial quantitative genetic traits: namely that traits of the offspring of parents who lie at the tails of the distribution often tend to lie closer to the centre, the mean, of the distribution. He quantified this trend, and in doing so invented linear regression analysis, thus laying the groundwork for much of modern statistical modeling. Since then, the term "regression" has been used in other contexts, and it may be used by modern statisticians to describe phenomena such as sampling bias which have little to do with Galton's original observations in the field of genetics.Galton's explanation for the regression phenomenon he observed in biology was stated as follows: "A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large." Galton's statement requires some clarification in light of knowledge of genetics: Children receive genetic material from their parents, but hereditary information from earlier ancestors can be passed through their parents. The mean for the trait may be nonrandom and determined by selection pressure, but the distribution of values around the mean reflects a normal statistical distribution.
There may be two people that are heterozygous on a locus and their children may inherit a recessive phenotype or a dominant phenotype. When this effect is multiplied over many loci then this creates a normal distribution for a given quantitative trait, assuming all alleles in the population have a similar proportional impact. In other words, regression to the mean is in part due to heterozygosity and in part due to panmixia and environmental factors. Through selection, it would be possible to have genes that contribute to a quantitative trait only be homozygous dominant or homozygous recessive and regression to the mean would be eliminated if environmental factors were controlled for since there would only be one phenotype for that trait.
The population-genetic phenomenon studied by Galton is a special case of "regression to the mean"; the term is often used to describe many statistical phenomena in which data exhibit a normal distribution around a mean.
Importance
Regression toward the mean is a significant consideration in the design of experiments.Take a hypothetical example of 1,000 individuals of a similar age who were examined and scored on the risk of experiencing a heart attack. Statistics could be used to measure the success of an intervention on the 50 who were rated at the greatest risk, as measured by a test with a degree of uncertainty. The intervention could be a change in diet, exercise, or a drug treatment. Even if the interventions are worthless, the test group would be expected to show an improvement on their next physical exam, because of regression toward the mean. The best way to combat this effect is to divide the group randomly into a treatment group that receives the treatment, and a group that does not. The treatment would then be judged effective only if the treatment group improves more than the untreated group.
Alternatively, a group of disadvantaged children could be tested to identify the ones with most college potential. The top 1% could be identified and supplied with special enrichment courses, tutoring, counseling and computers. Even if the program is effective, their average scores may well be less when the test is repeated a year later. However, in these circumstances it may be considered unethical to have a control group of disadvantaged children whose special needs are ignored. A mathematical calculation for shrinkage can adjust for this effect, although it will not be as reliable as the control group method.
The effect can also be exploited for general inference and estimation. The hottest place in the country today is more likely to be cooler tomorrow than hotter, as compared to today. The best performing mutual fund over the last three years is more likely to see relative performance decline than improve over the next three years. The most successful Hollywood actor of this year is likely to have less gross, rather than more gross, for their next movie. The baseball player with the highest batting average partway through the season is more likely to have a lower average than a higher average over the remainder of the season.