Ukkonen's algorithm
In computer science, Ukkonen's algorithm is a linear-time, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string, adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property. The original algorithm presented by Peter Weiner in 1973 proceeded backward from the last character to the first one from the shortest to the longest suffix. A simpler algorithm was found by Edward M. McCreight in 1976, going from the longest to the shortest suffix.
Implicit suffix tree
While generating suffix tree using Ukkonen's algorithm, we will see implicit suffix tree in intermediate steps depending on characters in string S. In implicit suffix trees, there will be no edge with $ label and no internal node with only one edge going out of it.High level description of Ukkonen's algorithm
Ukkonen's algorithm constructs an implicit suffix tree T for each prefix S of S. It first builds T using the 1 character, then T using the 2 character, then T using the 3 character,..., T using the n character. You can find the following characteristics in a suffix tree that uses Ukkonen's algorithm:- Implicit suffix tree T is built on top of implicit suffix tree T.
- At any given time, Ukkonen's algorithm builds the suffix tree for the characters seen so far and so it has on-line property, allowing the algorithm to have an execution time of O.
- Ukkonen's algorithm is divided into n phases.
- Each phase i+1 is further divided into i+1 extensions, one for each of the i+1 suffixes of S.
- If the path from the root labelled S ends at a leaf edge, then character S is just added to the end of the label on that leaf edge.
- if the path from the root labelled S ends at a non-leaf edge and next character is not S, then a new leaf edge with label S and number j is created starting from character S. A new internal node will also be created if S ends inside a non-leaf edge.
- If the path from the root labelled S ends at a non-leaf edge and next character is S, do nothing.
Run time
The naive implementation for generating a suffix tree going forward requires or even time complexity in big O notation, where is the length of the string. By exploiting a number of algorithmic techniques, Ukkonen reduced this to time, for constant-size alphabets, and in general, matching the runtime performance of the earlier two algorithms.Ukkonen's algorithm example
To better illustrate how a suffix tree is constructed using Ukkonen's algorithm, we can consider the stringS = xabxac.- Start with an empty root node.
- Construct for
Sby adding the first character of the string. Rule 2 applies, which creates a new leaf node. - Construct for
Sby adding suffixes ofxa. Rule 1 applies, which extends the path label in existing leaf edge. Rule 2 applies, which creates a new leaf node. - Construct for
Sby adding suffixes ofxab. Rule 1 applies, which extends the path label in existing leaf edge. Rule 2 applies, which creates a new leaf node. - Construct for
Sby adding suffixes ofxabx. Rule 1 applies, which extends the path label in existing leaf edge. Rule 3 applies, do nothing. - Constructs for
Sby adding suffixes ofxabxa. Rule 1 applies, which extends the path label in existing leaf edge. Rule 3 applies, do nothing. - Constructs for
Sby adding suffixes ofxabxac. Rule 1 applies, which extends the path label in existing leaf edge. Rule 2 applies, which creates a new leaf node.