Van Wijngaarden grammar

In computer science, a Van Wijngaarden grammar is a formalism for defining formal languages. The name derives from the formalism invented by Adriaan van Wijngaarden
for the purpose of defining the ALGOL 68 programming language.
The resulting specification remains its most notable application.
Van Wijngaarden grammars address the problem that context-free grammars cannot express agreement or reference, where two different parts of the sentence must agree with each other in some way. For example, the sentence "The birds was eating" is not Standard English because it fails to agree on number. A context-free grammar would parse "The birds was eating" and "The birds were eating" and "The bird was eating" in the same way. However, context-free grammars have the benefit of simplicity whereas van Wijngaarden grammars are considered highly complex.

Two levels

W-grammars are two-level grammars: they are defined by a pair of grammars, that operate on different levels:

the hypergrammar is an attribute grammar, i.e. a set of context-free grammar rules in which the nonterminals may have attributes; and
the metagrammar is a context-free grammar defining possible values for these attributes.

The set of strings generated by a W-grammar is defined by a two-stage process:

within each hyperrule, for each attribute that occurs in it, pick a value for it generated by the metagrammar; the result is a normal context-free grammar rule; do this in every possible way;
use the resulting context-free grammar to generate strings in the normal way.

The consistent substitution used in the first step is the same as substitution in predicate logic, and actually supports logic programming; it corresponds to unification in Prolog, as noted by Alain Colmerauer.
W-grammars are Turing complete;
hence, all decision problems regarding the languages they generate, such as

whether a W-grammar generates a given string
whether a W-grammar generates no strings at all

are undecidable.
Curtailed variants, known as affix grammars, were developed, and applied in compiler construction and to the description of natural languages.
Definite logic programs, that is, logic programs that make no use of negation, can be viewed as a subclass of W-grammars.

Motivation and history

In the 1950s, attempts started to apply computers to the recognition, interpretation and translation of natural languages, such as English and Russian. This requires a machine-readable description of the phrase structure of sentences, that can be used to parse and interpret them, and to generate them. Context-free grammars, a concept from structural linguistics, were adopted for this purpose; their rules can express how sentences are recursively built out of parts of speech, such as noun phrases and verb phrases, and ultimately, words, such as nouns, verbs, and pronouns.
This work influenced the design and implementation of programming languages, most notably, of ALGOL 60, which introduced a syntax description in Backus–Naur form.
However, context-free rules cannot express agreement or reference, where two different parts of the sentence must agree with each other in some way.
These can be readily expressed in W-grammars.
Programming languages have the analogous notions of typing and scoping.
A compiler or interpreter for the language must recognize which uses of a variable belong together. This is typically subject to constraints such as:

A variable must be initialized before its value is used.
In strongly typed languages, each variable is assigned a type, and all uses of the variable must respect its type.
Often, its type must be declared explicitly, before use.

W-grammars are based on the idea of providing the nonterminal symbols of context-free grammars with attributes that pass information between the nodes of the parse tree, used to constrain the syntax and to specify the semantics.
This idea was well known at the time; e.g. Donald Knuth visited the ALGOL 68 design committee while developing his own version of it, attribute grammars.
By augmenting the syntax description with attributes, constraints like the above can be checked, ruling many invalid programs out at compile time.
As Van Wijngaarden wrote in his preface:
Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; complex data structures and operations can be defined by pattern matching.
After their introduction in the 1968 ALGOL 68 "Final Report", W-grammars were widely considered as too powerful and unconstrained to be practical.
This was partly a consequence of the way in which they had been applied; the 1973 ALGOL 68 "Revised Report" contains a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars, when used in their full generality, are indeed too powerful for such practical purposes as serving as the input for a parser generator.
They describe precisely all recursively enumerable languages, which makes parsing impossible in general: it is an undecidable problem to decide whether a given string can be generated by a given W-grammar.
Hence, their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.

Extended Affix Grammars ;
Q-systems, also applied to natural language processing;
the CDL series of languages, applied as compiler construction languages for programming languages.

After the 1970s, interest in the approach waned; occasionally, new studies are published.

Examples

Agreement in English grammar

In English, nouns, pronouns and verbs have attributes such as grammatical number, gender, and person, which must agree between subject, main verb, and pronouns referring to the subject:

I wash myself.
She washes herself.
We wash ourselves.

are valid sentences; invalid are, for instance:

*I washes ourselves.
*She wash himself.
*We wash herself.

Here, agreement serves to stress that both pronouns refer to the same person.
A context-free grammar to generate all such sentences:

::=

Van Wijngaarden grammar

Two levels

Motivation and history

Examples

Agreement in English grammar

A standard non-context-free language

Requiring valid use of variables in ALGOL

ALGOL 68 examples

ALGOL 68 as in the 1968 Final Report §2.1

ALGOL 68 as in the 1973 Revised Report §2.2.1, §10.1.1

Implementations

Applications outside of ALGOL 68