Statistical dispersion
In statistics, dispersion is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered.
Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.
Measures of statistical dispersion
A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse.Most measures of dispersion have the same units as the quantity being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion. Examples of dispersion measures include:
- Standard deviation
- Interquartile range
- Range
- Mean absolute difference
- Median absolute deviation
- Average absolute deviation
- Distance standard deviation
All the above measures of statistical dispersion have the useful property that they are location-invariant and linear in scale. This means that if a random variable has a dispersion of then a linear transformation for real and should have dispersion, where is the absolute value of, that is, ignores a preceding negative sign.
Other measures of dispersion are dimensionless. In other words, they have no units even if the variable itself has units. These include:
- Coefficient of variation
- Quartile coefficient of dispersion
- Relative mean difference, equal to twice the Gini coefficient
- Entropy: While the entropy of a discrete variable is location-invariant and scale-independent, and therefore not a measure of dispersion in the above sense, the entropy of a continuous variable is location invariant and additive in scale: If is the entropy of a continuous variable and, then.
- Variance – location-invariant but not linear in scale.
- Variance-to-mean ratio – mostly used for count data when the term coefficient of dispersion is used and when this ratio is dimensionless, as count data are themselves dimensionless, not otherwise.
For categorical variables, it is less common to measure dispersion by a single number; see qualitative variation. One measure that does so is the discrete entropy.