Regular tree grammar

In theoretical computer science and language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-path trees.

Definition

A regular tree grammar G is defined by the tuple G =, where:N is a finite set of nonterminals,

Σ is a ranked alphabet disjoint from N,Z is the starting nonterminal, with, andP is a finite set of productions of the form A → t, with, and, where T_Σ is the associated term algebra, i.e. the set of all trees composed from symbols in according to their arities, where nonterminals are considered nullary.

Derivation of trees

The grammar G implicitly defines a set of trees: any tree that can be derived from Z using the rule set P is said to be described by G.
This set of trees is known as the language of G.
Formally, the relation ⇒_G on the set T_Σ is defined as follows:
A tree can be derived in a single step into a tree
, if there is a context S and a production such that:t₁ = S, andt₂ = S.
Here, a context means a tree with exactly one hole in it; if S is such a context, S denotes the result of filling the tree t into the hole of S.
The tree language generated by G is the language.
Here, T_Σ denotes the set of all trees composed from symbols of Σ, while ⇒_G* denotes successive applications of ⇒_G.
A language generated by some regular tree grammar is called a regular tree language.

Examples

Let G₁ =, whereN₁ = is our set of nonterminals,

Σ₁ = is our ranked alphabet, arities indicated by dummy arguments,Z₁ = BList is our starting nonterminal, and
the set P₁ consists of the following productions:
* Bool → false
* Bool → true
* BList → nil
* BList → cons

An example derivation from the grammar G₁ is
BList
⇒ cons
⇒ cons
⇒ cons.
The image shows the corresponding derivation tree; it is a tree of trees, whereas a derivation tree in word grammars is a tree of strings.
The tree language generated by G₁ is the set of all finite lists of boolean values, that is, L happens to equal T_Σ1.
The grammar G₁ corresponds to the algebraic data type declarations :

datatype Bool
= false
| true
datatype BList
= nil
| cons of Bool * BList

Every member of L corresponds to a Standard-ML value of type BList.
For another example, let, using the nonterminal set and the alphabet from above, but extending the production set by P₂, consisting of the following productions:BList → consBList → cons
The language L is the set of all finite lists of boolean values that contain true at least once. The set L has no datatype counterpart in Standard ML, nor in any other functional language.
It is a proper subset of L.
The above example term happens to be in L, too, as the following derivation shows:
BList
⇒ cons
⇒ cons
⇒ cons.

Language properties

If L₁, L₂ both are regular tree languages, then the tree sets, and L₁ \ L₂ are also regular tree languages, and it is decidable whether, and whether L₁ = L₂.

Alternative characterizations and relation to other formal languages

Regular tree grammars are a generalization of regular word grammars.
The regular tree languages are also the languages recognized by bottom-up tree automata and nondeterministic top-down tree automata.
Rajeev Alur and Parthasarathy Madhusudan related a subclass of regular binary tree languages to nested words and visibly pushdown languages.

Applications

Applications of regular tree grammars include:

Instruction selection in compiler code generation
A decision procedure for the first-order logic theory of formulas over equality and set membership as the only predicates
Solving constraints about mathematical sets
The set of all truths expressible in first-order logic about a finite algebra
Graph-search