Pumping lemma for context-free languages
In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.
The pumping lemma can be used to construct a refutation by contradiction that a specific language is not context-free. Conversely, the pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.
Formal statement
If a language is context-free, then there exists some integer such that every string in that has a length of or more symbols can be written aswith substrings and, such that
- ,
- , and
- for all.
Informal statement and explanation
The pumping lemma for context-free languages describes a property that all context-free languages are guaranteed to have.The property holds for all strings in the language that are of length at least, where is a constant—called the pumping length—that varies between context-free languages.
Say is a string of length at least that is in the language.
The pumping lemma states that can be split into five substrings,, where is non-empty and the length of is at most, such that repeating and the same number of times in produces a string that is still in the language. It is often useful to repeat zero times, which removes and from the string. This process of "pumping up" with additional copies of and is what gives the pumping lemma its name.
Finite languages obey the pumping lemma trivially by having equal to the maximum string length in plus one. As there are no strings of this length the pumping lemma holds vacuously.
Usage of the lemma
The pumping lemma is often used to prove that a given language is non-context-free, by showing that arbitrarily long strings are in that cannot be "pumped" without producing strings outside.For example, if is infinite but does not contain an arithmetic progression, then is not context-free. In particular, neither the prime numbers nor the square numbers are context-free.
For example, the language can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that is context free. By the pumping lemma, there exists an integer which is the pumping length of language. Consider the string in. The pumping lemma tells us that can be written in the form, where, and are substrings, such that,, and for every integer. By the choice of and the fact that, it is easily seen that the substring can contain no more than two distinct symbols. That is, we have one of five possibilities for :
- for some.
- for some and with
- for some.
- for some and with.
- for some.
In 1960, Scheinberg proved that is not context-free using a precursor of the pumping lemma.
While the pumping lemma is often a useful tool to prove that a given language is not context-free, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example
for with e.g. j≥1 choose to consist only of bs, for choose to consist only of as; in both cases all pumped strings are still in L.
To prove that a given language is context-free, it is sufficient to construct a pushdown automaton that accepts it.