Inclusion–exclusion principle

In combinatorics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as
where A and B are two finite sets and |S| indicates the cardinality of a set S. The formula expresses the fact that the sum of the sizes of the two sets may be too large since some elements may be counted twice. The double-counted elements are those in the intersection of the two sets and the count is corrected by subtracting the size of the intersection.
The inclusion-exclusion principle, being a generalization of the two-set case, is perhaps more clearly seen in the case of three sets, which for the sets A, B and C is given by
This formula can be verified by counting how many times each region in the Venn diagram figure is included in the right-hand side of the formula. In this case, when removing the contributions of over-counted elements, the number of elements in the mutual intersection of the three sets has been subtracted too often, so must be added back in to get the correct total.
Image:Inclusion-exclusion.svg|thumb|Inclusion–exclusion illustrated by a Venn diagram for three sets
Generalizing the results of these examples gives the principle of inclusion–exclusion. To find the cardinality of the union of sets:

Include the cardinalities of the sets.
Exclude the cardinalities of the pairwise intersections.
Include the cardinalities of the triple-wise intersections.
Exclude the cardinalities of the quadruple-wise intersections.
Include the cardinalities of the quintuple-wise intersections.
Continue, until the cardinality of the -tuple-wise intersection is included or excluded.

The name comes from the idea that the principle is based on over-generous inclusion, followed by compensating exclusion.
This concept is attributed to Abraham de Moivre, although it first appears in a paper of Daniel da Silva and later in a paper by J. J. Sylvester. Sometimes the principle is referred to as the formula of Da Silva or Sylvester, due to these publications. The principle can be viewed as an example of the sieve method extensively used in number theory and is sometimes referred to as the sieve formula.
As finite probabilities are computed as counts relative to the cardinality of the probability space, the formulas for the principle of inclusion–exclusion remain valid when the cardinalities of the sets are replaced by finite probabilities. More generally, both versions of the principle can be put under the common umbrella of measure theory.
In a very abstract setting, the principle of inclusion–exclusion can be expressed as the calculation of the inverse of a certain matrix. This inverse has a special structure, making the principle an extremely valuable technique in combinatorics and related areas of mathematics. As Gian-Carlo Rota put it:

"One of the most useful principles of enumeration in discrete probability and combinatorial theory is the celebrated principle of inclusion–exclusion. When skillfully applied, this principle has yielded the solution to many a combinatorial problem."

Formula

In its general formula, the principle of inclusion–exclusion states that for finite sets, one has the identity
Image:inclusion-exclusion-3sets.png|thumb|Each term of the inclusion–exclusion formula gradually corrects the count until finally each portion of the Venn diagram is counted exactly once.
This can be compactly written as
or
In words, to count the number of elements in a finite union of finite sets, first sum the cardinalities of the individual sets, then subtract the number of elements that appear in at least two sets, then add back the number of elements that appear in at least three sets, then subtract the number of elements that appear in at least four sets, and so on. This process always ends since there can be no elements that appear in more than the number of sets in the union.
In applications it is common to see the principle expressed in its complementary form. That is, letting be a finite universal set containing all of the and letting denote the complement of in, by De Morgan's laws we have
As another variant of the statement, let be a list of properties that elements of a set may or may not have, then the principle of inclusion–exclusion provides a way to calculate the number of elements of that have none of the properties. Just let be the subset of elements of which have the property and use the principle in its complementary form. This variant is due to J. J. Sylvester.
Notice that if you take into account only the first sums on the right, then you will get an overestimate if is odd and an underestimate if is even.

Examples

Counting derangements

A more complex example is the following.
Suppose there is a deck of n cards numbered from 1 to n. Suppose a card numbered m is in the correct position if it is the m^th card in the deck. How many ways, W, can the cards be shuffled with at least 1 card being in the correct position?
Begin by defining set A_m, which is all of the orderings of cards with the m^th card correct. Then the number of orders, W, with at least one card being in the correct position, m, is
Apply the principle of inclusion–exclusion,
Each value represents the set of shuffles having at least p values m₁, ..., m_p in the correct position. Note that the number of shuffles with at least p values correct only depends on p, not on the particular values of. For example, the number of shuffles having the 1st, 3rd, and 17th cards in the correct position is the same as the number of shuffles having the 2nd, 5th, and 13th cards in the correct positions. It only matters that of the n cards, 3 were chosen to be in the correct position. Thus there are equal terms in the p^th summation.
is the number of orderings having p elements in the correct position, which is equal to the number of ways of ordering the remaining n − p elements, or !. Thus we finally get:
A permutation where no card is in the correct position is called a derangement. Taking n! to be the total number of permutations, the probability Q that a random shuffle produces a derangement is given by
a truncation to n + 1 terms of the Taylor expansion of e⁻¹. Thus the probability of guessing an order for a shuffled deck of cards and being incorrect about every card is approximately e⁻¹ or 37%.

A special case

The situation that appears in the derangement example above occurs often enough to merit special attention. Namely, when the size of the intersection sets appearing in the formulas for the principle of inclusion–exclusion depend only on the number of sets in the intersections and not on which sets appear. More formally, if the intersection
has the same cardinality, say α_k = |A_J|, for every k-element subset J of, then
Or, in the complementary form, where the universal set S has cardinality α₀,

Formula generalization

Given a family of subsets A₁, A₂,..., A_n of a universal set S, the principle of inclusion–exclusion calculates the number of elements of S in none of these subsets. A generalization of this concept would calculate the number of elements of S which appear in exactly some fixed m of these sets.
Let N = =. If we define, then the principle of inclusion–exclusion can be written as, using the notation of the previous section; the number of elements of S contained in none of the A_i is:
If I is a fixed subset of the index set N, then the number of elements which belong to A_i for all i in I and for no other values is:
Define the sets
We seek the number of elements in none of the B_k which, by the principle of inclusion–exclusion, is
The correspondence K ↔ J = I ∪ K between subsets of N \ I and subsets of N containing I is a bijection and if J and K correspond under this map then B_K = A_J, showing that the result is valid.

In probability

In probability, for events A₁,..., A_n in a probability space, the inclusion–exclusion principle becomes for n = 2
for n = 3
and in general
which can be written in closed form as
where the last sum runs over all subsets I of the indices 1,..., n which contain exactly k elements, and
denotes the intersection of all those A_i with index in I.
According to the Bonferroni inequalities, the sum of the first terms in the formula is alternately an upper bound and a lower bound for the LHS. This can be used in cases where the full formula is too cumbersome.
For a general measure space and measurable subsets A₁,..., A_n of finite measure, the above identities also hold when the probability measure is replaced by the measure μ.

Special case

If, in the probabilistic version of the inclusion–exclusion principle, the probability of the intersection A_I only depends on the cardinality of I, meaning that for every k in there is an a_k such that
then the above formula simplifies to
due to the combinatorial interpretation of the binomial coefficient. For example, if the events are independent and identically distributed, then for all i, and we have, in which case the expression above simplifies to
An analogous simplification is possible in the case of a general measure space and measurable subsets of finite measure.
There is another formula used in point processes. Let be a finite set and be a random subset of. Let be any subset of, then