E-graph
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity. Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.The e-graph then represents equivalence classes of e-nodes, using the following data structures:
- A union-find structure representing equivalence classes of e-class IDs, with the usual operations, and. An e-class ID is canonical if ; an e-node is canonical if each is canonical.
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- * a hashcons from e-nodes to e-class IDs, and
- * an e-class map that maps e-class IDs to e-classes, such that maps equivalent IDs to the same set of e-nodes:
Invariants
Operations
E-graphs expose wrappers around the,, and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.Equivalent formulations
An e-graph can also be formulated as a bipartite graph where- is the set of e-class IDs,
- is the set of e-nodes, and
- is a set of directed edges.
E-matching
Let be a set of variables and let be the smallest set that includes the 0-arity function symbols, includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that,, and when and, then. A term containing variables is called a pattern, a term without variables is called ground.An e-graph represents a ground term if one of its e-classes represents. An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term .
e-matching is an operation that takes a pattern and an e-graph, and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that the term is represented by. There are several known algorithms for e-matching, the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.
Extraction
Given an e-class and a cost function that maps each function symbol in to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard. There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.Complexity
- An e-graph with n equalities can be constructed in O time.
Equality saturation
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL-based solvers that use conflict-driven clause learning, e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.Equality saturation is used in specialized optimizing compilers, e.g. for deep learning and linear algebra. Equality saturation has also been used for translation validation applied to the LLVM toolchain.
E-graphs have been applied to several problems in program analysis, including fuzzing, abstract interpretation, and library learning.