Stack machine


In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a process virtual machine in which the primary interaction is moving short-lived temporary values to and from a push-down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

Design

Most or all stack machine instructions assume that operands will be from the stack, and results placed in the stack. The stack easily holds more than two inputs or more than one result, so a rich set of operations can be computed. In stack machine code, instructions will frequently have only an opcode commanding an operation, with no additional fields identifying a constant, register or memory cell, known as a zero address format. A computer that operates in such a way that the majority of its instructions do not include explicit addresses is said to utilize zero-address instructions. This greatly simplifies instruction decoding. Branches, load immediates, and load/store instructions require an argument field, but stack machines often arrange that the frequent cases of these still fit together with the opcode into a compact group of bits. The selection of operands from prior results is done implicitly by ordering the instructions. Some stack machine instruction sets are intended for interpretive execution of a virtual machine, rather than driving hardware directly.
Integer constant operands are pushed by or instructions. Memory is often accessed by separate or instructions containing a memory address or calculating the address from values in the stack. All practical stack machines have variants of the load–store opcodes for accessing local variables and formal parameters without explicit address calculations. This can be by offsets from the current top-of-stack address, or by offsets from a stable frame-base register.
The instruction set carries out most ALU actions with postfix operations that work only on the expression stack, not on data registers or main memory cells. This can be very convenient for executing high-level languages because most arithmetic expressions can be easily translated into postfix notation.
For example, consider the expression A*+, written in reverse Polish notation as A ''B C'' − * D ''E'' + +. Compiling and running this on a simple imaginary stack machine would take the form:
# stack contents :
push A # A
push B # B A
push C # C B A
subtract # B−C A
multiply # A*
push D # D A*
push E # E D A*
add # D+E A*
add # A*+
The arithmetic operations "subtract", "multiply", and "add" act on the two topmost operands of the stack. The computer takes both operands from the topmost values of the stack. The computer replaces those two values with the calculated difference, sum, or product. In other words the instruction's operands are "popped" off the stack, and its result are then "pushed" back onto the stack, ready for the next instruction.
Stack machines may have their expression stack and their call-return stack separated or as one integrated structure. If they are separated, the instructions of the stack machine can be pipelined with fewer interactions and less design complexity, so it will usually run faster.
Optimisation of compiled stack code is quite possible. Back-end optimisation of compiler output has been demonstrated to significantly improve code, and potentially performance, whilst global optimisation within the compiler itself achieves further gains.

Stack storage

Some stack machines have a stack of limited size, implemented as a register file. The ALU will access this with an index. A large register file uses a lot of transistors and hence this method is only suitable for small systems. A few machines have both an expression stack in memory and a separate register stack. In this case, software, or an interrupt may move data between them. Some machines have a stack of unlimited size, implemented as an array in RAM, which is cached by some number of "top of stack" address registers to reduce memory access. Except for explicit "load from memory" instructions, the order of operand usage is identical with the order of the operands in the data stack, so excellent prefetching can be accomplished easily.
Consider. It compiles to ; ;. With a stack stored completely in RAM, this does implicit writes and reads of the in-memory stack:
  • Load X, push to memory
  • Load 1, push to memory
  • Pop 2 values from memory, add, and push result to memory
for a total of 5 data cache references.
The next step up from this is a stack machine or interpreter with a single top-of-stack register. The above code then does:
  • Load X into empty TOS register or Push TOS register to memory, Load X into TOS register
  • Push TOS register to memory, Load 1 into TOS register
  • Pop left operand from memory, add to TOS register and leave it there
for a total of 3 data cache references, worst-case. Generally, interpreters don't track emptiness, because they don't have to—anything below the stack pointer is a non-empty value, and the TOS cache register is always kept hot. Typical Java interpreters do not buffer the top-of-stack this way, however, because the program and stack have a mix of short and wide data values.
If the hardwired stack machine has 2 or more top-stack registers, or a register file, then all memory access is avoided in this example and there is only 1 data cache cycle.

History and implementations

Description of such a method requiring only two values at a time to be held in registers, with a limited set of pre-defined operands that were able to be extended by definition of further operands, functions and subroutines, was first provided at conference by Robert S. Barton in 1961.

Commercial stack machines

Examples of stack instruction sets directly executed in hardware include
  • the Z4 computer by Konrad Zuse had a 2-level stack.
  • the Burroughs Large Systems architecture
  • the English Electric KDF9 machine. First delivered in 1964, the KDF9 had a 19-level deep pushdown stack of arithmetic registers, and a 17-level deep stack for subroutine return addresses
  • the Collins Radio Collins Adaptive Processing System minicomputer and Rockwell Collins Advanced Architecture Microprocessor.
  • the Xerox Dandelion, introduced 27 April 1981, and the Xerox Daybreak utilized a stack machine architecture to save memory.
  • the UCSD Pascal p-machine supported a complete student programming environment on early 8-bit microprocessors with poor instruction sets and little RAM, by compiling to a virtual stack machine.
  • MU5 and ICL 2900 Series. Hybrid stack and accumulator machines. The accumulator register buffered the memory stack's top data value. Variants of load and store opcodes controlled when that register was spilled to the memory stack or reloaded from there.
  • HP 3000
  • HP 9000 systems based on the HP FOCUS microprocessor.
  • Tandem Computers T/16. Like HP 3000, except that compilers, not microcode, controlled when the register stack spilled to the memory stack or was refilled from the memory stack.
  • the Atmel MARC4 microcontroller
  • Several "Forth chips" such as the RTX2000, the RTX2010, the F21 and the PSC1000
  • The Setun ternary computer performed balanced ternary using a stack.
  • Patriot Scientific's Ignite stack machine designed by Charles H. Moore holds a leading functional density benchmark.
  • Saab Ericsson Space Thor radiation hardened microprocessor
  • Inmos transputers.
  • ZPU A physically-small CPU designed to supervise FPGA systems.
  • Some technical handheld calculators use reverse Polish notation in their keyboard interface, instead of having parenthesis keys. This is a form of stack machine. The Plus key relies on its two operands already being at the correct topmost positions of the user-visible stack.

    Virtual stack machines

Examples of virtual stack machines interpreted in software:
Pure stack machines are quite inefficient for procedures which access multiple fields from the same object. The stack machine code must reload the object pointer for each pointer+offset calculation. A common fix for this is to add some register-machine features to the stack machine: a visible register file dedicated to holding addresses, and register-style instructions for doing loads and simple address calculations. It is uncommon to have the registers be fully general purpose, because then there is no strong reason to have an expression stack and postfix instructions.
Another common hybrid is to start with a register machine architecture, and add another memory address mode which emulates the push or pop operations of stack machines: "memaddress = reg; reg += instr.displ". This was first used in DEC's PDP-11 minicomputer. This feature was carried forward in VAX computers and in Motorola 6809 and 68000 series microprocessors. This allowed the use of simpler stack methods in early compilers. It also efficiently supported virtual machines using stack interpreters or threaded code. However, this feature did not help the register machine's own code to become as compact as pure stack machine code. Also, the execution speed was less than when compiling well to the register architecture. It is faster to change the top-of-stack pointer only occasionally rather than constantly stepping it up and down throughout each program statement, and it is even faster to avoid memory references entirely.
More recently, so-called second-generation stack machines have adopted a dedicated collection of registers to serve as address registers, off-loading the task of memory addressing from the data stack. For example, MuP21 relies on a register called "A", while the more recent GreenArrays processors relies on two registers: A and B.
The Intel x86 family of microprocessors have a register-style instruction set for most operations, but use stack instructions for its x87, Intel 8087 floating point arithmetic, dating back to the iAPX87 coprocessor for the 8086 and 8088. That is, there are no programmer-accessible floating point registers, but only an 80-bit wide, 8-level deep stack. The x87 relies heavily on the x86 CPU for assistance in performing its operations.