Tree (abstract data type)


In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.
Binary trees are a commonly used type, which constrain the number of children for each parent to at most two. When the order of the children is specified, this data structure corresponds to an ordered tree in graph theory. A value or pointer to other data may be associated with every node in the tree, or sometimes only with the leaf nodes, which have no children nodes.
The abstract data type can be represented in a number of ways, including a list of parents with pointers to children, a list of children with pointers to parents, or a list of nodes and a separate list of parent-child relations. Representations might also be more complicated, for example using indexes or ancestor lists for performance.
Trees as used in computing are similar to but can be different from mathematical constructs of trees in graph theory, trees in set theory, and trees in descriptive set theory.

Terminology

A node is a structure which may contain data and connections to other nodes, sometimes called edges or links. Each node in a tree has zero or more child nodes, which are below it in the tree. A node that has a child is called the child's parent node. All nodes have exactly one parent, except the topmost root node, which has none. A node might have many ancestor nodes, such as the parent's parent. Child nodes with the same parent are sibling nodes. Typically siblings have an order, with the first one conventionally drawn on the left. Some definitions allow a tree to have no nodes at all, in which case it is called empty.
An internal node is any node of a tree that has child nodes. Similarly, an external node is any node that does not have child nodes.
The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root. Thus the root node has depth zero, leaf nodes have height zero, and a tree with only a single node has depth and height zero. Conventionally, an empty tree has height −1.
Each non-root node can be treated as the root node of its own subtree, which includes that node and all its descendants.
Other terms used with trees:

Common operations

  • Enumerating all the items
  • Enumerating a section of a tree
  • Searching for an item
  • Adding a new item at a certain position on the tree
  • Deleting an item
  • Pruning: Removing a whole section of a tree
  • Grafting: Adding a whole section to a tree
  • Finding the root for any node
  • Finding the lowest common ancestor of two nodes

    Traversal and search methods

Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a walk of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and finally its right subtree are traversed is called an in-order traversal. A level-order walk effectively performs a breadth-first search over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed.

Representations

There are many different ways to represent trees. In working memory, nodes are typically dynamically allocated records with pointers to their children, their parents, or both, as well as any associated data. If of a fixed size, the nodes might be stored in a list. Nodes and relationships between nodes might be stored in a separate special type of adjacency list. In relational databases, nodes are typically represented as table rows, with indexed row IDs facilitating pointers between parents and children.
Nodes can also be stored as items in an array, with relationships between them determined by their positions in the array.
A binary tree can be implemented as a list of lists: the head of a list is the left child, while the tail is the right child. This can be modified to allow values as well, as in Lisp S-expressions, where the head is the value of the node, the head of the tail is the left child, and the tail of the tail is the right child.
Ordered trees can be naturally encoded by finite sequences, for example with natural numbers.

Examples of trees and non-trees

Type theory

As an abstract data type, the abstract tree type with values of some type is defined, using the abstract forest type , by the functions:
with the axioms:
In terms of type theory, a tree is an inductive type defined by the constructors and .

Mathematical terminology

Viewed as a whole, a tree data structure is an ordered tree, generally with values attached to each node. Concretely, it is :
  • A rooted tree with the "away from root" direction, meaning:
  • * A directed graph,
  • * whose underlying undirected graph is a tree,
  • * with a distinguished root,
  • * which determines the direction on the edges, together with:
  • an ordering on the child nodes of a given node, and
  • a value at each node.
Often trees have a fixed branching factor, particularly always having two child nodes, hence a "binary tree".
Allowing empty trees makes some definitions simpler, some more complicated: a rooted tree must be non-empty, hence if empty trees are allowed the above definition instead becomes "an empty tree or a rooted tree such that...". On the other hand, empty trees simplify defining fixed branching factor: with empty trees allowed, a binary tree is a tree such that every node has exactly two children, each of which is a tree.

Applications

Trees are commonly used to represent or manipulate hierarchical data in applications such as:
Trees can be used to represent and manipulate various mathematical structures, such as:
Tree structures are often used for mapping the relationships between things, such as:
  • Components and subcomponents which can be visualized in an exploded-view drawing
  • Subroutine calls used to identify which subroutines in a program call other subroutines non recursively
  • Inheritance of DNA among species by evolution, of source code by software projects, of designs in various types of cars, etc.
  • The contents of hierarchical namespaces
JSON and YAML documents can be thought of as trees, but are typically represented by nested lists and dictionaries.