Analysis of variance
Analysis of variance is a family of statistical methods used to compare the means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation between the group means to the amount of variation within each group. If the between-group variation is substantially larger than the within-group variation, it suggests that the group means are likely different. This comparison is done using an F-test. The underlying principle of ANOVA is based on the law of total variance, which states that the total variance in a dataset can be broken down into components attributable to different sources. In the case of ANOVA, these sources are the variation between groups and the variation within groups.
ANOVA was developed by the statistician Ronald Fisher. In its simplest form, it provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.
History
While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into the past according to Stigler. These include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s. Around 1800, Laplace and Gauss developed the least-squares method for combining observations, which improved upon methods then used in astronomy and geodesy. It also initiated much study of the contributions to sums of squares. Laplace knew how to estimate a variance from a residual sum of squares. By 1827, Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides. Before 1800, astronomers had isolated observational errors resultingfrom reaction times and had developed methods of reducing the errors. The experimental methods used in the study of the personal equation were later accepted by the emerging field of psychology which developed strong experimental methods to which randomization and blinding were soon added. An eloquent non-mathematical explanation of the additive effects model was available in 1885.
Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article on theoretical population genetics, The Correlation Between Relatives on the Supposition of Mendelian Inheritance. His first application of the analysis of variance to data analysis was published in 1921, Studies in Crop Variation I. This divided the variation of a time series into components representing annual causes and slow deterioration. Fisher's next piece, Studies in Crop Variation II, written with Winifred Mackenzie and published in 1923, studied the variation in yield across plots sown with different varieties and subjected to different fertiliser treatments. Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers.
Randomization models were developed by several researchers. The first was published in Polish by Jerzy Neyman in 1923.
Example
The analysis of variance can be used to describe otherwise complex relations among variables. A dog show provides an example. A dog show is not a random sampling of the breed: it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show is likely to be rather complicated, like the yellow-orange distribution shown in the illustrations. Suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. One way to do that is to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that each group has a low variance of dog weights and the mean of each group is distinct.In the illustrations to the right, groups are identified as X1, X2, etc. In the first illustration, the dogs are divided according to the product of two binary groupings: young vs old, and short-haired vs long-haired. Since the distributions of dog weight within each of the groups has a relatively large variance, and since the means are very similar across groups, grouping dogs by these characteristics does not produce an effective way to explain the variation in dog weights: knowing which group a dog is in doesn't allow us to predict its weight much better than simply knowing the dog is in a dog show. Thus, this grouping fails to explain the variation in the overall distribution.
An attempt to explain the weight distribution by grouping dogs as pet vs working breed and less athletic vs more athletic would probably be somewhat more successful. The heaviest show dogs are likely to be big, strong, working breeds, while breeds kept as pets tend to be smaller and thus lighter. As shown by the second illustration, the distributions have variances that are considerably smaller than in the first case, and the means are more distinguishable. However, the significant overlap of distributions, for example, means that we cannot distinguish X1 and X2 reliably. Grouping dogs according to a coin flip might produce distributions that look similar.
An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in weights between Setters and Pointers does not justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive judgments. A common use of the method is the analysis of experimental data or the development of models. The method has some advantages over correlation: not all of the data must be numeric and one result of the method is a judgment in the confidence in an explanatory relationship.
Classes of models
There are three classes of models used in the analysis of variance, and these are outlined here.Fixed-effects models
The fixed-effects model of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see whether the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.Random-effects models
Random-effects model is used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments differ from the fixed-effects model.Mixed-effects models
A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.Example
Teaching experiments could be performed by a college or university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the incumbent texts to randomly selected alternatives.Defining fixed and random effects has proven elusive, with multiple competing definitions.
Assumptions
The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.Textbook analysis using a normal distribution
The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:- Independence of observations – this is an assumption of the model that simplifies the statistical analysis.
- Normality – the distributions of the residuals are normal.
- Equality of variances, called homoscedasticity—the variance of data in groups should be the same.
Randomization-based analysis
In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University. Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.Unit-treatment additivity
In its simplest form, the assumption of unit-treatment additivity states that the observed response from experimental unit when receiving treatment can be written as the sum of the unit's response and the treatment-effect, that isThe assumption of unit-treatment additivity implies that, for every treatment, the th treatment has exactly the same effect on every experiment unit.
The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant.
The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.