Reduced instruction set computer
In electronics and computer science, a reduced instruction set computer is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set computer, a RISC computer might require more machine code in order to accomplish a task because the individual instructions perform simpler operations. The goal is to offset the need to process more instructions by increasing the speed of each instruction, in particular by implementing an instruction pipeline, which may be simpler to achieve given simpler instructions.
The key operational concept of the RISC computer is that each instruction performs only one function. The RISC computer usually has many high-speed, general-purpose registers with a load–store architecture in which the instructions that perform arithmetic and tests operate only on the registers, and the instructions that access data in the main memory of the computer only load data from memory into registers or store data from registers into memory. The design of the CPU allows RISC computers few simple addressing modes and predictable instruction times that simplify design of the system as a whole.
The conceptual developments of the RISC computer architecture began with the IBM 801 project in the late 1970s, but these were not immediately put into use. Designers in California picked up the 801 concepts in two seminal projects, Stanford MIPS and Berkeley RISC. These were commercialized in the 1980s as the MIPS and SPARC systems. IBM eventually produced RISC designs based on further work on the 801 concept, the IBM POWER architecture, PowerPC, and Power ISA. As the projects matured, many similar designs, produced in the mid-to-late 1980s and early 1990s, such as ARM, PA-RISC, and Alpha, created central processing units that increased the commercial utility of the Unix workstation and of embedded processors in the laser printer, the router, and similar products.
In the minicomputer market, companies that included Celerity Computing, Pyramid Technology, and Ridge Computers began offering systems designed according to RISC or RISC-like principles in the early 1980s. Few of these designs began by using RISC microprocessors.
The varieties of RISC processor design include the ARC processor, the DEC Alpha, the AMD Am29000, the ARM architecture, the Atmel AVR, Blackfin, Intel i860, Intel i960, LoongArch, Motorola 88000, the MIPS architecture, PA-RISC, Power ISA, RISC-V, SuperH, and SPARC. RISC processors are used in supercomputers, such as the Fugaku.
History and development
A number of systems, going back to the 1960s, have been credited as the first RISC architecture, partly based on their use of the load–store approach. The term RISC was coined by David Patterson of the Berkeley RISC project, although somewhat similar concepts had appeared before.The CDC 6600 designed by Seymour Cray in 1964 used a load–store architecture with only two addressing modes and 74 operation codes, with the basic clock cycle being 10 times faster than the memory access time. Partly due to the optimized load–store architecture of the CDC 6600, Jack Dongarra says that it can be considered a forerunner of modern RISC systems, although a number of other technical barriers needed to be overcome for the development of a modern RISC system.
IBM 801
views the first RISC system as the IBM 801 design, begun in 1975 by John Cocke and completed in 1980. The 801 developed out of an effort to build a 24-bit high-speed processor to use as the basis for a digital telephone switch. To reach their goal of switching 1 million calls per hour they calculated that the CPU required performance on the order of 12 million instructions per second, compared to their fastest mainframe machine of the time, the 370/168, which performed at 3.5 MIPS.The design was based on a study of IBM's extensive collection of statistics gathered from its customers. This demonstrated that code in high-performance settings made extensive use of processor registers, and that they often ran out of them. This suggested that additional registers would improve performance. Additionally, they noticed that compilers generally ignored the vast majority of the available instructions, especially orthogonal addressing modes. Instead, they selected the fastest version of any given instruction and then constructed small routines using it. This suggested that the majority of instructions could be removed without affecting the resulting code. These two conclusions worked in concert; removing instructions would allow the instruction opcodes to be shorter, freeing up bits in the instruction word which could then be used to select among a larger set of registers.
The telephone switch program was canceled in 1975, but by then the team had demonstrated that the same design would offer significant performance gains running just about any code. In simulations, they showed that a compiler tuned to use registers instead of operating directly on memory would run code about three times as fast as traditional designs. Somewhat surprisingly, the same code would run about 50% faster even on existing machines due to the improved register use. In practice, their experimental PL/8 compiler, a slightly cut-down version of PL/I, consistently produced code that ran much faster on their existing mainframes.
A 32-bit version of the 801 was eventually produced in a single-chip form as the IBM ROMP in 1981, which stood for 'Research OPD Micro Processor'. This CPU was designed for "mini" tasks, and found use in peripheral interfaces and channel controllers on later IBM computers. It was also used as the CPU in the IBM RT PC in 1986, which turned out to be a commercial failure. Although the 801 did not see widespread use in its original form, it inspired many research projects, including ones at IBM that would eventually lead to the IBM POWER architecture.
Berkeley RISC and Stanford MIPS
By the late 1970s, the 801 had become well-known in the industry. This coincided with new fabrication techniques that were allowing more complex chips to come to market. The Zilog Z80 of 1976 had 8,000 transistors, whereas the 1979 Motorola 68000 had 68,000. These newer designs generally used their newfound complexity to expand the instruction set to make it more orthogonal. Most, like the 68k, used microcode to do this, reading instructions and re-implementing them as a sequence of simpler internal instructions. In the 68k, a full of the transistors were used for this microcoding.In 1979, David Patterson was sent on a sabbatical from the University of California, Berkeley to help DEC's west-coast team improve the VAX microcode. Patterson was struck by the complexity of the coding process and concluded it was untenable. He first wrote a paper on ways to improve microcoding, but later changed his mind and decided microcode itself was the problem. With funding from the DARPA VLSI Program, Patterson started the Berkeley RISC effort. The program, practically unknown today, led to a huge number of advances in chip design, fabrication, and even computer graphics. Considering a variety of programs from their BSD Unix variant, the Berkeley team found, as had IBM, that most programs made no use of the large variety of instructions in the 68k.
Patterson's early work pointed out an important problem with the traditional "more is better" approach; even those instructions that were critical to overall performance were being delayed by their trip through the microcode. If the microcode was removed, the programs would run faster. And since the microcode ultimately took a complex instruction and broke it into steps, there was no reason the compiler could not do this instead. These studies suggested that, even with no other changes, one could make a chip with fewer transistors that would run faster. In the original RISC-I paper they noted:
Skipping this extra level of interpretation appears to enhance performance while reducing chip size.
It was also discovered that, on microcoded implementations of certain architectures, complex operations tended to be slower than a sequence of simpler operations doing the same thing. This was in part an effect of the fact that many designs were rushed, with little time to optimize or tune every instruction; only those used most often were optimized, and a sequence of those instructions could be faster than a less-tuned instruction performing an equivalent operation as that sequence. One infamous example was the VAX's
INDEX instruction.The Berkeley work also turned up a number of additional points. Among these was the fact that programs spent a significant amount of time performing subroutine calls and returns, and it seemed there was the potential to improve overall performance by speeding up these calls. This led the Berkeley design to select a method known as register windows, which can significantly improve subroutine performance, although at the cost of some complexity. They also noticed that the majority of mathematical instructions were simple assignments; only of them actually performed an operation like addition or subtraction. But when those operations did occur, they tended to be slow. This led to far more emphasis on the underlying arithmetic data unit, as opposed to previous designs where the majority of the chip was dedicated to control and microcode.
The resulting Berkeley RISC was based on gaining performance through the use of pipelining and aggressive use of register windowing. In a traditional CPU, one has a small number of registers, and a program can use any register at any time. In a CPU with register windows, there are a huge number of registers, e.g., 128, but programs can only use a small number of them, e.g., eight, at any one time. A program that limits itself to eight registers per procedure can make very fast procedure calls: The call simply moves the window "down" by eight, to the set of eight registers used by that procedure, and the return moves the window back. The Berkeley RISC project delivered the RISC-I processor in 1982. Consisting of only 44,420 transistors, RISC-I had only 32 instructions, and yet completely outperformed any other single-chip design, with estimated performance being higher than the VAX. They followed this up with the 40,760-transistor, 39-instruction RISC-II in 1983, which ran over three times as fast as RISC-I.
As the RISC project began to become known in Silicon Valley, a similar project began at Stanford University in 1981. This MIPS project grew out of a graduate course by John L. Hennessy, produced a functioning system in 1983, and could run simple programs by 1984. The MIPS approach emphasized an aggressive clock cycle and the use of the pipeline, making sure it could be run as "full" as possible. The MIPS system was followed by the MIPS-X, and in 1984 Hennessy and his colleagues formed MIPS Computer Systems to produce the design commercially. The venture resulted in a new architecture that was also called MIPS and the R2000 microprocessor in 1985.
The overall philosophy of the RISC concept was widely understood by the second half of the 1980s and led the designers of the MIPS-X to put it this way in 1987:
Competition between RISC and conventional CISC approaches was also the subject of theoretical analysis in the early 1980s, leading, for example, to the iron law of processor performance.
Since 2010, a new open standard instruction set architecture, Berkeley RISC-V, has been under development at the University of California, Berkeley, for research purposes and as a free alternative to proprietary ISAs. As of 2014, version 2 of the user space ISA is fixed. The ISA is designed to be extensible from a barebones core sufficient for a small embedded processor to supercomputer and cloud computing use with standard and chip designer–defined extensions and coprocessors. It has been tested in silicon design with the ROCKET SoC, which is also available as an open-source processor generator in the CHISEL language.