Intel 8231/8232

The Intel 8231 and 8232 were early designs of floating-point math coprocessors, marketed for use with their i8080 line of primary CPUs. They were licensed versions of AMD's Am9511 and Am9512 FPUs, from 1977 and 1979, themselves claimed by AMD as the world's first single-chip FPU solutions.

Adoption

While the i8231/i8232 were primarily intended to partner the i8080, the multiple interface options in their design, from simple wait state insertion and status polling routines to interrupt and DMA controller driven methods suitable for a peripheral processor or add-in board, meant that – with a small amount of glue logic – it was usable in almost any microprocessor system that had a DMA subsystem or a spare interrupt input/interrupt vector available, and AMD's original documentation provided several different examples. This was a valuable feature for one of the first commercially available single-chip FPUs, greatly broadening its potential market, and was in stark contrast to Intel's succeeding, in-house designed 8087 FPUs which were tightly bound to the x86 CPU line. For example, the i8231A was used in the Applied Analytics MicroSPEED II and II+ accelerator cards for the 6502-based Apple II line, but examples were also given for the Z80, MC6800, i8085, and even the 16-bit Z8000. Additionally, prior to the introduction of the 8087, Intel's own preliminary datasheets suggested the chips as suitable companions for the then-new 8086.

Capacity

The Intel 8231 is the Arithmetic Processing Unit. It offered 32-bit "double" precision floating-point, and 16-bit or 32-bit fixed-point calculation of 14 different arithmetic and trigonometric functions to a proprietary standard. The APU used Chebyshev polynomials. The available APU version of 4-MHz was for USD $235.00 and 2-MHz was for USD $149.00 in quantities of 100 or more. The later Intel 8232 is the Floating Point Processor Unit. It performed 32-bit or 64-bit floating point calculations compliant with the IEEE-754 standard, but only on the four primary arithmetic functions. The available FPU version of 4-MHz was for USD $235.00 and 2-MHz was for USD $149.00 in quantities of 100 or more.
All three chips used an 8-bit data bus design, in line with the i8080 and most other contemporary microprocessors. The 8231 could run at up to 3 MHz, and the 8231A and 8232 up to 4 MHz, either in sync with the CPU or asynchronously depending on the degree of bus separation in the host system. Async operation was a useful addition to the feature set, as it allowed a roughly 1 MHz Apple II system to be expanded with a 4 MHz 8231A and enjoy the benefit of much faster numeric processing than it may otherwise have been limited to, or a 5 MHz i8085-based system to host an 8231A or 8232 without itself having to be slowed to 4 MHz or less to maintain compatibility. It also, along with the interrupt driven peripheral design, allowed a degree of parallel processing between the CPU and FPU, with the former resuming its own normal processing after passing commands and data to the essentially "offboard" coprocessor, only switching back to the floating-point subtask when signalled by the coprocessor that processing was complete. This parallelization was vital to improving overall system throughput, when some of the more complex functions could still take the FPU several milliseconds to complete – an eternity in computing terms.
Instruction execution times were quite variable and, as an early generation design, typically much longer than those seen in later, more evolved FPUs. For example, ignoring data and stack handling instructions, on the 8232 they ranged from 56 clock periods for a single-precision subtraction, to a full 4560 periods for a double-precision divide, for an effective processing speed of 877 to 71429 FLOPS. The 8231's instructions ranged from 17 periods for a 16-bit fixed-point addition, through 98 to 378 periods for common 32-bit float operations, to as many as 12032 periods for a maximally complex "power" calculation, giving 332, through 10.6k–40.8k, to 235.3k FLOPS of performance depending on the instruction and data mix. While these numbers may seem low from a modern perspective, they compare reasonably well with the successor i8087, and were radically faster compared to performing the same calculations using software emulation on a regular CPU – even a relatively sophisticated, 16-bit 8086 running at a full 8 MHz could only achieve somewhere between a few dozen, to no more than around 1000 FLOPS without a coprocessor,. Its slower clocked, 8-bit predecessors and rivals would have fared even worse.