Static single-assignment form


In compiler design, static single assignment form is a type of intermediate representation where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.
There are efficient algorithms for converting programs into SSA form. To convert to SSA, existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version. Additional statements that assign to new versions of variables may also need to be introduced at the join point of two control flow paths. Converting from SSA form to machine code is also efficient.
SSA makes numerous analyses needed for optimizations easier to perform, such as determining use-define chains, because when looking at a use of a variable there is only one place where that variable may have received a value. Most optimizations can be adapted to preserve SSA form, so that one optimization can be performed after another with no additional analysis. The SSA based optimizations are usually more efficient and more powerful than their non-SSA form prior equivalents.
In functional language compilers, such as those for Scheme and ML, continuation-passing style is generally used. SSA is formally equivalent to a well-behaved subset of CPS excluding non-local control flow, so optimizations and transformations formulated in terms of one generally apply to the other. Using CPS as the intermediate representation is more natural for higher-order functions and interprocedural analysis. CPS also easily encodes call/cc, whereas SSA does not.

History

SSA was developed in the 1980s by several researchers at IBM. Kenneth Zadeck, a key member of the team, moved to Brown University as development continued. A 1986 paper introduced birthpoints, identity assignments, and variable renaming such that variables had a single static assignment. A subsequent 1987 paper by Jeanne Ferrante and Ronald Cytron proved that the renaming done in the previous paper removes all false dependencies for scalars. In 1988, Barry Rosen, Mark N. Wegman, and Kenneth Zadeck replaced the identity assignments with Φ-functions, introduced the name "static single-assignment form", and demonstrated a now-common SSA optimization. The name Φ-function was chosen by Rosen to be a more publishable version of "phony function". Alpern, Wegman, and Zadeck presented another optimization, but using the name "static single assignment". Finally, in 1989, Rosen, Wegman, Zadeck, Cytron, and Ferrante found an efficient means of converting programs to SSA form.

Benefits

The primary usefulness of SSA comes from how it simultaneously simplifies and improves the results of a variety of compiler optimizations, by simplifying the properties of variables. For example, consider this piece of code:
y := 1
y := 2
x := y
Humans can see that the first assignment is not necessary, and that the value of y being used in the third line comes from the second assignment of y. A program would have to perform reaching definition analysis to determine this. But if the program is in SSA form, both of these are immediate:
y1 := 1
y2 := 2
x1 := y2
Compiler optimization algorithms that are either enabled or strongly enhanced by the use of SSA include:
  • Constant folding – conversion of computations from runtime to compile time, e.g. treat the instruction a=3*4+5; as if it were a=17;
  • Value range propagation – precompute the potential ranges a calculation could be, allowing for the creation of branch predictions in advance
  • Sparse conditional constant propagation – range-check some values, allowing tests to predict the most likely branch
  • Dead-code elimination – remove code that will have no effect on the results
  • Global value numbering – replace duplicate calculations producing the same result
  • Partial-redundancy elimination – removing duplicate calculations previously performed in some branches of the program
  • Strength reduction – replacing expensive operations by less expensive but equivalent ones, e.g. replace integer multiply or divide by powers of 2 with the potentially less expensive shift left or shift right.
  • Register allocation – optimize how the limited number of machine registers may be used for calculations

    Converting to SSA

Converting ordinary code into SSA form is primarily a matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable reaching that point. For example, consider the following control-flow graph:
Image:SSA example1.1.png|An example control-flow graph, before conversion to SSA|center
Changing the name on the left hand side of "x x - 3" and changing the following uses of x to that new name would leave the program unaltered. This can be exploited in SSA by creating two new variables: x1 and x2, each of which is assigned only once. Likewise, giving distinguishing subscripts to all the other variables yields:
Image:SSA example1.2.png|An example control-flow graph, partially converted to SSA|center
It is clear which definition each use is referring to, except for one case: both uses of y in the bottom block could be referring to either y1 or y2, depending on which path the control flow took.
To resolve this, a special statement is inserted in the last block, called a Φ function. This statement will generate a new definition of y called y3 by "choosing" either y1 or y2, depending on the control flow in the past.
Image:SSA example1.3.png|An example control-flow graph, fully converted to SSA|center
Now, the last block can simply use y3, and the correct value will be obtained either way. A Φ function for x is not needed: only one version of x, namely x2 is reaching this place, so there is no problem.
Given an arbitrary control-flow graph, it can be difficult to tell where to insert Φ functions, and for which variables. This general question has an efficient solution that can be computed using a concept called dominance frontiers.
Φ functions are not implemented as machine operations on most machines. A compiler can implement a Φ function by inserting "move" operations at the end of every predecessor block. In the example above, the compiler might insert a move from y1 to y3 at the end of the middle-left block and a move from y2 to y3 at the end of the middle-right block. These move operations might not end up in the final code based on the compiler's register allocation procedure. However, this approach may not work when simultaneous operations are speculatively producing inputs to a Φ function, as can happen on wide-issue machines. Typically, a wide-issue machine has a selection instruction used in such situations by the compiler to implement the Φ function.

Computing minimal SSA using dominance frontiers

In a control-flow graph, a node A is said to ' a different node B if it is impossible to reach B without passing through A first. In other words, if node B is reached, then it can be assumed that A has run. A is said to dominate B if either A strictly dominates B or A = B.
A node which transfers control to a node A is called an '
of A.
The ' of node A is the set of nodes B where A strictly dominate B, but does dominate some immediate predecessor of B. These are the points at which multiple control paths merge back together into a single path.
For example, in the following code:

x = random
if x < 0.5
result = "heads"
else
result = "tails"
end
print

Node 1 strictly dominates 2, 3, and 4 and the immediate predecessors of node 4 are nodes 2 and 3.
Dominance frontiers define the points at which Φ functions are needed. In the above example, when control is passed to node 4, the definition of result used depends on whether control was passed from node 2 or 3. Φ functions are not needed for variables defined in a dominator, as there is only one possible definition that can apply.
There is an efficient algorithm for finding dominance frontiers of each node. This algorithm was originally described in "Efficiently Computing Static Single Assignment Form and the Control Graph" by Ron Cytron, Jeanne Ferrante, et al. in 1991.
Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy of Rice University describe an algorithm in their paper titled A Simple, Fast Dominance Algorithm:
for each node b
dominance_frontier :=
for each node b
if the number of immediate predecessors of b ≥ 2
for each p in immediate predecessors of b
runner := p
while runner ≠ idom
dominance_frontier := dominance_frontier ∪
runner := idom
In the code above, idom is the '
of b, the unique node that strictly dominates b but does not strictly dominate any other node that strictly dominates b.

Variations that reduce the number of Φ functions

"Minimal" SSA inserts the minimal number of Φ functions required to ensure that each name is assigned a value exactly once and that each reference of a name in the original program can still refer to a unique name.
However, some of these Φ functions could be dead. For this reason, minimal SSA does not necessarily produce the fewest Φ functions that are needed by a specific procedure. For some types of analysis, these Φ functions are superfluous and can cause the analysis to run less efficiently.