# Anscombe's quartet

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough."

## Data

For all four datasets:
 Property Value Accuracy Mean of x 9 exact Sample variance of x : 11 exact Mean of y 7.50 to 2 decimal places Sample variance of y : 4.125 ±0.003 Correlation between x and y 0.816 to 3 decimal places Linear regression line y = 3.00 + 0.500x to 2 and 3 decimal places, respectively Coefficient of determination of the linear regression : 0.67 to 2 decimal places

The quartet is still often used to illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relationship, and the inadequacy of basic statistic properties for describing realistic datasets.
The datasets are as follows. The x values are the same for the first three datasets.
It is not known how Anscombe created his datasets. Since its publication, several methods to generate similar data sets with identical statistics and dissimilar graphics have been developed.