Microcode


In processor design, microcode serves as an intermediary layer situated between the central processing unit hardware and the programmer-visible instruction set architecture of a computer. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.
Housed in special high-speed memory, microcode translates machine instructions, state machine data, or other input into sequences of detailed circuit-level operations. It separates the machine instructions from the underlying electronics, thereby enabling greater flexibility in designing and altering instructions. Moreover, it facilitates the construction of complex multi-step instructions, while simultaneously reducing the complexity of computer circuits. The act of writing microcode is often referred to as microprogramming, and the microcode in a specific processor implementation is sometimes termed a microprogram.
Through extensive microprogramming, microarchitectures of smaller scale and simplicity can emulate more robust architectures with wider word lengths, additional execution units, and so forth. This approach provides a relatively straightforward method of ensuring software compatibility between different products within a processor family.

Overview

Instruction sets

At the hardware level, processors contain a number of separate areas of circuitry, or "units", that perform different tasks. Commonly found units include the arithmetic logic unit which performs instructions such as addition or comparing two numbers, circuits for reading and writing data to external memory, and small areas of onboard memory to store these values while they are being processed. In most designs, additional high-performance memory, the register file, is used to store temporary values, not just those needed by the current instruction.
To properly perform an instruction, the various circuits have to be activated in order. For instance, it is not possible to add two numbers if they have not yet been loaded from memory. In RISC designs, the proper ordering of these instructions is largely up to the programmer, or at least to the compiler of the programming language they are using. So to add two numbers in memory and store the result in memory, for instance, the compiler may output instructions to load one of the values into one register, the second into another, perform the addition function in the ALU, putting the result into a register, and then store that register into memory.
As the sequence of instructions needed to complete this higher-level concept, "add these two numbers in memory", may require multiple instructions, this can represent a performance bottleneck if those instructions are stored in main memory. Reading those instructions one by one takes time that could be used to read and write the actual data. For this reason, it is common for non-RISC designs to have many different instructions that differ largely on where they store data. For instance, the MOS 6502 has eight variations of the addition instruction,, which differ only in where they look to find the two operands.
Using the variation of the instruction, or "opcode", that most closely matches the ultimate operation can reduce the number of instructions to one, saving memory used by the program code and improving performance by leaving the data bus open for other operations. Internally, however, these instructions are not separate operations, but sequences of the operations the units actually perform. Converting a single instruction read from memory into the sequence of internal actions is the duty of the control unit, another unit within the processor.

Microcode

The basic idea behind microcode is to replace the custom hardware logic implementing the instruction sequencing with a series of simple instructions run in a "microcode engine" in the processor. Whereas a custom logic system might have a series of diodes and gates that output a series of voltages on various control lines, the microcode engine is connected to these lines instead, and these are turned on and off as the engine reads the microcode instructions in sequence. The microcode instructions are often bit encoded to those lines, for instance, if bit 8 is true, that might mean that the ALU should be paused awaiting data. In this respect microcode is somewhat similar to the paper rolls in a player piano, where the holes represent which key should be pressed.
The distinction between custom logic and microcode may seem small, one uses a pattern of diodes and gates to decode the instruction and produce a sequence of signals, whereas the other encodes the signals as microinstructions that are read in sequence to produce the same results. The critical difference is that in a custom logic design, changes to the individual steps require the hardware to be redesigned. Using microcode, all that changes is the code stored in the memory containing the microcode. This makes it much easier to fix problems in a microcode system. It also means that there is no effective limit to the complexity of the instructions, it is only limited by the amount of memory one is willing to use.
The lowest layer in a computer's software stack is traditionally raw machine code instructions for the processor. In microcoded processors, fetching and decoding those instructions, and executing them, may be done by microcode. To avoid confusion, each microprogram-related element is differentiated by the micro prefix: microinstruction, microassembler, microprogrammer, etc.
Complex digital processors may also employ more than one control unit in order to delegate sub-tasks that must be performed essentially asynchronously in parallel. For example, the VAX 9000 has a hardwired IBox unit to fetch and decode instructions, which it hands to a microcoded EBox unit to be executed, and the VAX 8800 has both a microcoded IBox and a microcoded EBox.
A high-level programmer, or even an assembly language programmer, does not normally see or change microcode. Unlike machine code, which often retains some backward compatibility among different processors in a family, microcode only runs on the exact electronic circuitry for which it is designed, as it constitutes an inherent part of the particular processor design itself.

Design

Engineers normally write the microcode during the design phase of a processor, storing it in a read-only memory or programmable logic array structure, or in a combination of both. However, machines also exist that have some or all microcode stored in static random-access memory or flash memory. This is traditionally denoted as writable control store in the context of computers, which can be either read-only or read–write memory. In the latter case, the CPU initialization process loads microcode into the control store from another storage medium, with the possibility of altering the microcode to correct bugs in the instruction set, or to implement new machine instructions.

Microprograms

Microprograms consist of series of microinstructions, which control the CPU at a very fundamental level of hardware circuitry. For example, a single typical horizontal microinstruction might specify the following operations simultaneously:
  • Connect register 1 to the A side of the ALU
  • Connect register 7 to the B side of the ALU
  • Set the ALU to perform two's-complement addition
  • Set the ALU's carry input to zero
  • Connect the ALU output to register 8
  • Update the condition codes from the ALU status flags
  • Microjump to a given μPC address for the next microinstruction
To simultaneously control all processor's features in one cycle, the microinstruction is often wider than 50 bits; e.g., 128 bits on a 360/85 with an emulator feature. Microprograms are carefully designed and optimized for the fastest possible execution, as a slow microprogram would result in a slow machine instruction and degraded performance for related application programs that use such instructions.

Justification

Microcode was originally developed as a simpler method of developing the control logic for a computer. Initially, CPU instruction sets were hardwired. Each step needed to fetch, decode, and execute the machine instructions was controlled directly by combinational logic and rather minimal sequential state machine circuitry. While such hard-wired processors were very efficient, the need for powerful instruction sets with multi-step addressing and complex operations made them difficult to design and debug; highly encoded and varied-length instructions can contribute to this as well, especially when very irregular encodings are used.
Microcode simplified the job by allowing much of the processor's behaviour and programming model to be defined via microprogram routines rather than by dedicated circuitry. Even late in the design process, microcode could easily be changed, whereas hard-wired CPU designs were very cumbersome to change. Thus, this greatly facilitated CPU design.
From the 1940s to the late 1970s, a large portion of programming was done in assembly language; higher-level instructions mean greater programmer productivity, so an important advantage of microcode was the relative ease by which powerful machine instructions can be defined. The ultimate extension of this are "Directly Executable High Level Language" designs, in which each statement of a high-level language such as PL/I is entirely and directly executed by microcode, without compilation. The IBM Future Systems project and Data General Fountainhead Processor are examples of this. During the 1970s, CPU speeds grew more quickly than memory speeds and numerous techniques such as memory block transfer, memory pre-fetch and multi-level caches were used to alleviate this. High-level machine instructions, made possible by microcode, helped further, as fewer more complex machine instructions require less memory bandwidth. For example, an operation on a character string can be done as a single machine instruction, thus avoiding multiple instruction fetches.
Architectures with instruction sets implemented by complex microprograms included the IBM System/360 and Digital Equipment Corporation VAX. The approach of increasingly complex microcode-implemented instruction sets was later called complex instruction set computer. An alternate approach, used in many microprocessors, is to use one or more programmable logic array or read-only memory mainly for instruction decoding, and let a simple state machine do most of the sequencing. The MOS Technology 6502 is an example of a microprocessor using a PLA for instruction decode and sequencing. The PLA is visible in photomicrographs of the chip, and its operation can be seen in the transistor-level simulation.
Microprogramming is still used in modern CPU designs. In some cases, after the microcode is debugged in simulation, logic functions are substituted for the control store. Logic functions are often faster and less expensive than the equivalent microprogram memory.