De Bruijn graph
In graph theory, an -dimensional De Bruijn graph of symbols is a directed graph representing overlaps between sequences of symbols. It has vertices, consisting of all possible sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a set of symbols the set of vertices is:
If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs is
Although De Bruijn graphs are named after Nicolaas Govert de Bruijn, they were invented independently by both de Bruijn and I. J. Good. Much earlier, Camille Flye Sainte-Marie implicitly used their properties.
Properties
- If, then the condition for any two vertices forming an edge holds vacuously, and hence all the vertices are connected, forming a total of edges.
- Each vertex has exactly incoming and outgoing edges.
- Each -dimensional De Bruijn graph is the line digraph of the De Bruijn graph with the same set of symbols.
- Each De Bruijn graph is Eulerian and Hamiltonian. The Euler cycles and Hamiltonian cycles of these graphs are De Bruijn sequences.
Dynamical systems
Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of dynamical systems, such as the Lorenz attractor:This analogy can be made rigorous: the -dimensional -symbol De Bruijn graph is a model of the Bernoulli map
The Bernoulli map is an ergodic dynamical system, which can be understood to be a single shift of a -adic number. The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real in the interval to the vertex corresponding to the first digits in the base- representation of. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided subshift of finite type.
Embeddings resembling this one can be used to show that the binary De Bruijn graphs have queue number 2 and that they have book thickness at most 5.
Uses
- Some grid network topologies are De Bruijn graphs.
- The distributed hash table protocol Koorde uses a De Bruijn graph.
- In bioinformatics, De Bruijn graphs are used for de novo assembly of sequencing reads into a genome. Instead of the complete De Bruijn graphs described above that contain all possible k-mers, de novo sequence assemblers make use of De Bruijn subgraphs that contain only the k-mers observed in a sequencing dataset.
- In time series forecasting, De Bruijn graphs have been adapted to encode temporal patterns by mapping discrete subsequences of observations to graph nodes. This enables the modeling of sequential dependencies in symbolic or discretized time series data. Multivariate De Bruijn graphs extend this idea by jointly encoding patterns across multiple correlated variables, allowing for the representation of complex inter-variable temporal dynamics in multivariate time series.