Intel 8086

The 8086 is a 16-bit microprocessor chip released by Intel on June 8, 1978 after development began in early 1976. It was followed by the Intel 8088 in 1979, which was a slightly modified chip with an external 8-bit data bus.
The 8086 gave rise to the x86 architecture, which eventually became Intel's most successful line of processors. On June 5, 2018, Intel released a limited-edition CPU celebrating the 40th anniversary of the Intel 8086, called the Intel Core i7-8086K.

History

Background

In 1972, Intel launched the 8008, Intel's first 8-bit microprocessor. It implemented an instruction set designed by Datapoint Corporation with programmable CRT terminals in mind, which also proved to be fairly general-purpose. The device needed several additional ICs to produce a functional computer, in part due to it being packaged in a small 18-pin "memory package", which ruled out the use of a separate address bus.
Two years later, Intel launched the 8080, employing the new 40-pin DIL packages originally developed for calculator ICs to enable a separate address bus. It had an extended instruction set that is source-compatible with the 8008 and also included some 16-bit instructions to make programming easier. The 8080 device was eventually replaced by the depletion-load-based 8085, which used a single +5 V power supply instead of the three different operating voltages of earlier chips. Other well known 8-bit microprocessors that emerged during these years are Motorola 6800, General Instrument PIC16X, MOS Technology 6502, Zilog Z80, and Motorola 6809.

The first x86 design

The 8086 project started in May 1976 and was originally intended as a temporary substitute for the ambitious and delayed iAPX 432 project. It was an attempt to draw attention from the less-delayed 16-bit and 32-bit processors of other manufacturers — Motorola, Zilog, and National Semiconductor.
While the 8086 was a 16-bit microprocessor, it used a similar architecture as Intel's 8-bit microprocessors. This allowed assembly language programs written in 8-bit to seamlessly migrate. New instructions and features — such as signed integers, base+offset addressing, and self-repeating operations — were added. Instructions were added to assist source code compilation of nested functions in the ALGOL-family of languages, including Pascal and PL/M. According to principal architect Stephen P. Morse, this was a result of a more software-centric approach. Other enhancements included microcode instructions for the multiply and divide assembly language instructions. Designers also anticipated coprocessors, such as 8087 and 8089, so the bus structure was designed to be flexible.
The first revision of the instruction set and high level architecture was ready after about three months, and as almost no CAD tools were used, four engineers and 12 layout people were simultaneously working on the chip. The 8086 took a little more than two years from idea to working product, which was considered fast for a complex design in the 1970s.
The 8086 was sequenced using a mixture of random logic and microcode and was implemented using depletion-load nMOS circuitry with approximately 20,000 active transistors. It was soon moved to a new refined nMOS manufacturing process called HMOS that Intel originally developed for manufacturing of fast static RAM products. This was followed by HMOS-II, HMOS-III versions, and, eventually, a fully static CMOS version for battery powered devices, manufactured using Intel's CHMOS processes. The original chip measured 33 mm² and minimum feature size was 3.2 μm. The MUL and DIV instructions were very slow due to being microcoded so x86 programmers usually just used the bit shift instructions for multiplying and dividing instead.
The 8086 was die-shrunk to 2 μm in 1981; this version also corrected a stack register bug in the original 3.5 μm chips. Later 1.5 μm and CMOS variants were outsourced to other manufacturers and not developed in-house.
The architecture was defined by Stephen P. Morse with some help from Bruce Ravenel in refining the final revisions. Logic designer Jim McKevitt and John Bayliss were the lead engineers of the hardware-level development team and Bill Pohlman the manager for the project. The legacy of the 8086 is enduring in the basic instruction set of today's personal computers and servers; the 8086 also lent its last two digits to later extended versions of the design, such as the Intel 286 and the Intel 386, all of which eventually became known as the x86 family. In addition, the PCI Vendor ID for system devices produced by Intel is 8086.

Details

Buses and operation

All internal registers, as well as internal and external data buses, are 16 bits wide, which firmly established the "16-bit microprocessor" identity of the 8086. A 20-bit external address bus provides a 1 MiB physical address space. This address space is addressed by means of internal memory "segmentation". The data bus is multiplexed with the address bus in order to fit all of the control lines into a standard 40-pin dual in-line package. It provides a 16-bit I/O address bus, supporting 64 KB of separate I/O space. The maximum linear address space is limited to 64 KB, simply because internal address/index registers are only 16 bits wide. Programming over 64 KB memory boundaries involves adjusting the segment registers ; this difficulty existed until the 80386 architecture introduced wider registers.

Hardware modes of 8086

Some of the control pins, which carry essential signals for all external operations, have more than one function depending upon whether the device is operated in min or max mode. The former mode is intended for small single-processor systems, while the latter is for medium or large systems using more than one processor. Maximum mode is required when using an 8087 or 8089 coprocessor. The voltage on pin 33 determines the mode. Changing the state of pin 33 changes the function of certain other pins, most of which have to do with how the CPU handles the bus. The mode is usually hardwired into the circuit and therefore cannot be changed by software. The workings of these modes are described in terms of timing diagrams in Intel datasheets and manuals. In minimum mode, all control signals are generated by the 8086 itself.

Registers and instruction

The 8086 has eight more-or-less general 16-bit registers. Four of them, AX, BX, CX, DX, can also be accessed as 8-bit register pairs while the other four, SI, DI, BP, SP, are 16-bit only.
Due to a compact encoding inspired by 8-bit processors, most instructions are one-address or two-address operations, which means that the result is stored in one of the operands. At most one of the operands can be in memory, but this memory operand can also be the destination, while the other operand, the source, can be either register or immediate. A single memory location can also often be used as both source and destination which, among other factors, further contributes to a code density comparable to most eight-bit machines at the time.
The degree of generality of most registers is much greater than in the 8080 or 8085. However, 8086 registers were more specialized than in most contemporary minicomputers and are also used implicitly by some instructions. While perfectly sensible for the assembly programmer, this makes register allocation for compilers more complicated compared to more orthogonal 16-bit and 32-bit processors of the time such as the PDP-11, VAX, 68000, 32016, etc. On the other hand, being more regular than the rather minimalistic but ubiquitous 8-bit microprocessors such as the 6502, 6800, 6809, 8085, MCS-48, 8051, and other contemporary accumulator-based machines, it is significantly easier to construct an efficient code generator for the 8086 architecture.
Another factor for this is that the 8086 also introduced some new instructions to better support stack-based high-level programming languages such as Pascal and PL/M; some of the more useful instructions are push mem-op, and ret size, supporting the "Pascal calling convention" directly.
A 64 KB stack growing towards lower addresses is supported in hardware; 16-bit words are pushed onto the stack, and the top of the stack is pointed to by SS:SP. There are 256 interrupts, which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the return addresses.
The 8086 has 64 K of 8-bit I/O port space.

Flags

The 8086 has a 16-bit flags register. Nine of these condition code flags are active, and indicate the current state of the processor: Carry flag, Parity flag, Auxiliary carry flag, Zero flag, Sign flag, Trap flag, Interrupt flag, Direction flag, and Overflow flag.
Also referred to as the status word, the layout of the flags register is as follows:

Segmentation

There are also four 16-bit segment registers that allow the 8086 CPU to access one megabyte of memory in an unusual way. Rather than concatenating the segment register with the address register, as in most processors whose address space exceeds their register size, the 8086 shifts the 16-bit segment four bits left before adding it to the 16-bit offset, therefore producing a 20-bit external address from the 32-bit segment:offset pair. As a result, any external address could be referred to by up to 2¹² = 4096 different segment:offset pairs.
Although considered complicated and cumbersome by many programmers, this scheme also has advantages; a small program can be loaded starting at a fixed offset in its own segment, avoiding the need for relocation, with at most 15 bytes of alignment waste.
Compilers for the 8086 family commonly support two types of pointer, near and far. Near pointers are 16-bit offsets implicitly associated with the program's code or data segment and so can be used only within parts of a program small enough to fit in one segment. Far pointers are 32-bit segment:offset pairs resolving to 20-bit external addresses. Some compilers also support huge pointers, which are like far pointers except that pointer arithmetic on a huge pointer treats it as a linear 20-bit pointer, while pointer arithmetic on a far pointer wraps around within its 16-bit offset without touching the segment part of the address.
To avoid the need to specify near and far on numerous pointers, data structures, and functions, compilers also support "memory models" which specify default pointer sizes. The tiny, small, compact, medium, large, and huge models cover practical combinations of near, far, and huge pointers for code and data. The tiny model means that code and data are shared in a single segment, just as in most 8-bit based processors, and can be used to build .com files for instance. Precompiled libraries often come in several versions compiled for different memory models.
According to Morse et al.,. the designers actually contemplated using an 8-bit shift, in order to create a 16 MB physical address space. However, as this would have forced segments to begin on 256-byte boundaries, and 1 MB was considered very large for a microprocessor around 1976, the idea was dismissed. Also, there were not enough pins available on a low cost 40-pin package for the additional four address bus pins.
In principle, the address space of the x86 series could have been extended in later processors by increasing the shift value, as long as applications obtained their segments from the operating system and did not make assumptions about the equivalence of different segment:offset pairs. In practice the use of "huge" pointers and similar mechanisms was widespread and the flat 32-bit addressing made possible with the 32-bit offset registers in the 80386 eventually extended the limited addressing range in a more general way.
The instruction stream is fetched from memory as words and is addressed internally by the processor to the byte level as necessary. An instruction stream queuing mechanism allows up to 6 bytes of the instruction stream to be queued while waiting for decoding and execution. The queue acts as a first-in-first-out buffer, from which the Execution Unit extracts instruction bytes as required. Whenever there is space for at least two bytes in the queue, the BIU will attempt a word fetch memory cycle. If the queue is empty, the first byte into the queue immediately becomes available to the EU.

Porting older software

Small programs could ignore the segmentation and just use plain 16-bit addressing. This allows 8-bit software to be quite easily ported to the 8086. The authors of most DOS implementations took advantage of this by providing an Application Programming Interface very similar to CP/M as well as including the simple .com executable file format, identical to CP/M. This was important when the 8086 and MS-DOS were new, because it allowed many existing CP/M applications to be quickly made available, greatly easing acceptance of the new platform.

Interrupts

Interrupts on the 8086 are can be either software or hardware-initiated. Interrupts are long calls that also save the processor status. Interrupt routines typically end with a IRET instruction. All interrupts have a 8-bit interrupt number associated with them. This number is used to look up a segment:offset in a 256 element interrupt vector table stored at addresses 0-3FFH. When any type of interrupt is encountered, the processor status is pushed, CS and IP are pushed, and the interrupt number is multiplied by four to index a new execution address which is loaded from the vector table.
There are three types of software interrupt instructions: INT n, INTO, and a single-byte INT 3 used for debugging.
There are two kinds of hardware interrupts: maskable and non-maskable.
Non-maskable interrupts are higher priority than maskable interrupts. They cannot be disabled by interrupt enable. A low to high transition on the NMI pin essentially causes an INT 2 to execute.
Maskable interrupts are enabled and disabled by the STI and CLI instructions respectively. When the INTR is asserted by a hardware device, the 8086 asserts INTA twice, reading an 8-bit interrupt number from the bus. This number is multiplied by four to point to the associated interrupt service routine address in the vector table. Maskable interrupts are disabled when INTA is asserted, but are re-enabled upon executing the IRET instruction at the end of the interrupt service routine.

Example code

The following 8086 assembly source code is for a subroutine named _strtolower that copies a null-terminated ASCIIZ character string from one location to another, converting all alphabetic characters to lower case. The string is copied one byte at a time.
The example code uses the BP register to establish a call frame, an area on the stack that contains all of the parameters and local variables for the execution of the subroutine. This kind of calling convention supports reentrant and recursive code and has been used by Algol-like languages since the late 1950s. A flat memory model is assumed, specifically, that the DS and ES segments address the same region of memory.

Performance

Although partly shadowed by other design choices in this particular chip, the multiplexed address and data buses limit performance slightly; transfers of 16-bit or 8-bit quantities are done in a four-clock memory access cycle, which is faster on 16-bit, although slower on 8-bit quantities, compared to many contemporary 8-bit based CPUs. As instructions vary from one to six bytes, fetch and execution are made concurrent and decoupled into separate units : The bus interface unit feeds the instruction stream to the execution unit through a 6-byte prefetch queue, speeding up operations on registers and immediates, while memory operations became slower. However, the full 16-bit architecture with a full-width ALU meant that 16-bit arithmetic instructions could now be performed with a single ALU cycle, speeding up such instructions considerably. Combined with orthogonalizations of operations versus operand types and addressing modes, as well as other enhancements, this made the performance gain over the 8080 or 8085 fairly significant, despite cases where the older chips may be faster.

EA = time to compute effective address, ranging from 5 to 12 cycles.
Timings are best case, depending on prefetch status, instruction alignment, and other factors.

As can be seen from these tables, operations on registers and immediates were fast, while memory-operand instructions and jumps were quite slow; jumps took more cycles than on the simple 8080 and 8085, and the 8088 was additionally hampered by its narrower bus. The reasons why most memory related instructions were slow were threefold:

Loosely coupled fetch and execution units are efficient for instruction prefetch, but not for jumps and random data access.
No dedicated address calculation adder was afforded; the microcode routines had to use the main ALU for this.
The address and data buses were multiplexed, forcing a slightly longer bus cycle than in typical contemporary 8-bit processors.

However, memory access performance was drastically enhanced with Intel's next generation of 8086 family CPUs. The 80186 and 80286 both had dedicated address calculation hardware, saving many cycles, and the 80286 also had separate address and data buses.

Floating point

The 8086/8088 could be connected to a mathematical coprocessor to add hardware/microcode-based floating-point performance. The Intel 8087 was the standard math coprocessor for the 8086 and 8088, operating on 80-bit numbers. Manufacturers like Cyrix and Weitek eventually came up with high-performance floating-point coprocessors that competed with the 8087.

Chip versions

The clock frequency was originally limited to 5 MHz, but the last versions in HMOS were specified for 10 MHz. HMOS-III and CMOS versions were manufactured for a long time for embedded systems, although its successor, the 80186/80188, has been more popular for embedded use.
The 80C86, the CMOS version of the 8086, was used in many portable computers and embedded systems, including the GridPad, Toshiba T1200, HP 110, and finally the 1998–1999 Lunar Prospector.
For the packaging, the Intel 8086 was available both in ceramic and plastic DIP packages.

Derivatives and clones

Compatible—and, in many cases, enhanced—versions were manufactured by Fujitsu, Harris/Intersil, OKI, Siemens, Texas Instruments, NEC, Mitsubishi, and AMD. For example, the NEC V20 and NEC V30 pair were hardware-compatible with the 8088 and 8086 even though NEC made original Intel clones μPD8088D and μPD8086D respectively, but incorporated the instruction set of the 80186 along with some of the 80186 speed enhancements, providing a drop-in capability to upgrade both instruction set and processing speed without manufacturers having to modify their designs. Such relatively simple and low-power 8086-compatible processors in CMOS are still used in embedded systems.
The electronics industry of the Soviet Union was able to replicate the 8086 through. The resulting chip, K1810VM86, was binary and pin-compatible with the 8086.
i8086 and i8088 were respectively the cores of the Soviet-made PC-compatible EC1831 and EC1832 desktops. However, the EC1831 computer had significant hardware differences from the IBM PC prototype. The EC1831 was the first PC-compatible computer with dynamic bus sizing. Later some of the EC1831 principles were adopted in PS/2 and some other machines.

Support chips

Intel 8237: direct memory access controller
Intel 8251: universal synchronous/asynchronous receiver/transmitter at 19.2 kbit/s
Intel 8253: programmable interval timer, 3x 16-bit max 10 MHz
Intel 8255: programmable peripheral interface, 3x 8-bit I/O pins used for printer connection etc.
Intel 8259: programmable interrupt controller
Intel 8279: keyboard/display controller, scans a keyboard matrix and display matrix like 7-seg
Intel 8282/8283: 8-bit latch
Intel 8284: clock generator
Intel 8286/8287: bidirectional 8-bit driver. In 1980 both Intel I8286/I8287 version were available for US$16.25 in quantities of 100.
Intel 8288: bus controller
Intel 8289: bus arbiter
NEC μPD765 or Intel 8272A: floppy controller

Microcomputers using the 8086

The Intel Multibus-compatible single-board computer ISBC 86/12 was announced in 1978.
The Xerox NoteTaker was one of the earliest portable computer designs in 1978 and used three 8086 chips, but never entered commercial production.
Seattle Computer Products shipped S-100 bus based 8086 systems as early as November 1979.
The Norwegian Mycron 2000, introduced in 1980s
The first Compaq Deskpro used an 8086 running at 7.16 MHz, but was compatible with add-in cards designed for the 4.77 MHz IBM PC XT and could switch the CPU down to the lower speed to avoid software timing issues.
An 8 MHz 8086-2 was used in the AT&T 6300 PC, an IBM PC-compatible desktop microcomputer. The M24 / PC 6300 has IBM PC/XT compatible 8-bit expansion slots, but some of them have a proprietary extension providing the full 16-bit data bus of the 8086 CPU, and all system peripherals including the onboard video system also enjoy 16-bit data transfers. The later Olivetti M24SP featured an 8086-2 running at the full maximum 10 MHz.
The IBM PS/2 models 25 and 30 were built with an 8 MHz 8086.
The Amstrad PC1512, PC1640, PC2086, PC3086 and PC5086 all used 8086 CPUs at 8 MHz.
The NEC PC-9801.
The Tandy 1000 SL-series and RL machines used 9.47 MHz 8086 CPUs.
The IBM Displaywriter word processing machine and the Wang Professional Computer, manufactured by Wang Laboratories, also used the 8086.
NASA used original 8086 CPUs on equipment for ground-based maintenance of the Space Shuttle Discovery until the end of the space shuttle program in 2011. This decision was made to prevent software regression that might result from upgrading or from switching to imperfect clones.
KAMAN Process and Area Radiation Monitors
The Tektronix 4170 ran CP/M-86 and used an 8086

One of the most influential microcomputers of all, the IBM PC, used the Intel 8088, a version of the 8086 with an 8-bit data bus.