Sanov's theorem
In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.
Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X. Suppose we draw n i.i.d. samples from q, represented by the vector. Then, we have the following bound on the probability that the empirical measure of the samples falls within the set A:
where
- is the joint probability distribution on, and
- is the information projection of q onto A.
- , the KL divergence, is given by:
Furthermore, if A is a closed set, then
Technical statement
Define:- is a finite set with size. Understood as “alphabet”.
- is the simplex spanned by the alphabet. It is a subset of.
- is a random variable taking values in. Take samples from the distribution, then is the frequency probability vector for the sample.
- is the space of values that can take. In other words, it is
- For every measurable subset,
- For every open subset,