ILLIAC IV


The ILLIAC IV was the first massively parallel computer. The system was originally designed to have 256 64-bit floating-point units and four central processing units able to process 1 billion operations per second. Due to budget constraints, only a single "quadrant" with 64 FPUs and a single CPU was built. Since the FPUs all processed the same instruction – ADD, SUB, etc. – in Flynn's taxonomy, the design would be considered to be single instruction, multiple threads - an array processor.
The concept of building a computer using an array of processors came to Daniel Slotnick while working as a programmer on the IAS machine in 1952. A formal design did not start until 1960, when Slotnick was working at Westinghouse Electric Corporation and arranged development funding under a United States Air Force contract. When that funding ended in 1964, Slotnick moved to the University of Illinois Urbana-Champaign and joined the Illinois Automatic Computer team. With funding from the Advanced Research Projects Agency, they began the design of a newer concept with 256 64-bit processors instead of the original concept with 1,024 1-bit processors.
While the machine was being assembled by Burroughs, the university began building a new facility to house it. Political tension over the funding from the United States Department of Defense led to ARPA and the university fearing for the machine's safety. When the first 64-processor quadrant of the machine was completed in 1972, it was sent to the NASA Ames Research Center in Mountain View, California. After three years of extensive modification to fix various flaws, ILLIAC IV was connected to the ARPANET for distributed use in November 1975, becoming the first network-available supercomputer, beating the Cray-1 by nearly 12 months.
Running at half its design speed, the one-quadrant ILLIAC IV delivered 50 MFLOP peak, making it the fastest computer in the world at that time. It is also credited with being the first large computer to use solid-state memory, as well as the most complex computer built to that date, with over 1 million logic gates. Generally considered a failure due to massive budget and timeline overruns, the design was instrumental in the development of new techniques and systems for programming parallel systems. In the 1980s, several machines based on ILLIAC IV concepts were successfully delivered.

History

Origins

In June 1952, Daniel Slotnick began working on the IAS machine at the Institute for Advanced Study at Princeton University. The IAS machine featured a bit-parallel math unit that operated on 40-bit words. Originally equipped with Williams tube memory, a magnetic drum memory from Engineering Research Associates was later added. This drum had 80 tracks so two words could be read at a time, and each track stored 1,024 bits.
While contemplating the drum's mechanism, Slotnik began to wonder if that was the correct way to build a computer. If the bits of a word were written serially to a single track, instead of in parallel across 40 tracks, then the data could be fed into a bit-serial computer directly from the drum bit-by-bit. The drum would still have multiple tracks and heads, but instead of gathering up a word and sending it to a single ALU, in this concept the data on each track would be read a bit at a time and sent into parallel ALUs. This would be a word-parallel, bit-serial computer.
Slotnick raised the idea at the IAS, but John von Neumann dismissed it as requiring "too many tubes". Slotnick left the IAS in February 1954 to return to school to pursue his PhD degree and the matter was forgotten.

SOLOMON

After completing his PhD and some post-doctoral work, Slotnick ended up at IBM. By this time, for scientific computing at least, tubes and drums had been replaced with transistors and magnetic-core memory. The idea of parallel processors working on different streams of data from a drum no longer had the same obvious appeal. Nevertheless, further consideration showed that parallel machines could still offer significant performance in some applications; Slotnick and a colleague, John Cocke, wrote a paper on the concept in 1958.
After a short time at IBM and then a stint at Aeronca Aircraft, Slotnick ended up at Westinghouse's Air Arm division, which worked on radar and similar systems. Under a contract from the Air Force's Rome Air Development Center, Slotnik was able to build a team to design a system with 1,024 bit-serial ALUs, known as "Processing Elements". This design was given the name SOLOMON, after King Solomon, who was both very wise and had 1,000 wives.
The PEs would be fed instructions from a single master CPU, the "control unit" or CU. SOLOMON's CU would read instructions from memory, decode them, and then hand them off to the PEs for processing. Each PE had its own memory for holding operands and results, the PE Memory module, or PEM. The CU could access the entire memory via a dedicated memory bus, whereas the PEs could only access their own PEM. To allow results from one PE to be used as inputs in another, a separate network connected each PE to its eight closest neighbours.
Several test-bed systems were constructed, including a 3-by-3 system and a 10-by-10 model with simplified PEs. During this period, some consideration was given to more complex PE designs, becoming a 24-bit parallel system that would be organized in a 256-by-32 arrangement. A single PE using this design was built in 1963. As the design work continued, the primary sponsor within the United States Department of Defense was killed in an accident and no further funding was forthcoming.
Looking to continue development, Slotnik approached Lawrence Livermore National Laboratory, who at that time had been at the forefront of supercomputer purchases. They were very interested in the design, but convinced him to upgrade the current design's fixed-point math units to true floating-point arithmetic, which resulted in the SOLOMON.2 design.
Livermore would not fund development, instead, they offered a contract in which they would lease the machine once it was completed. Westinghouse management considered it too risky, and shut down the team. Slotnik left Westinghouse attempting to find venture capital to continue the project, but failed. Livermore would later select the CDC STAR-100 for this role, as CDC was willing to take on the development costs.

ILLIAC IV

When SOLOMON ended, Slotnick joined the Illinois Automatic Computer design team at the University of Illinois at Urbana-Champaign. Illinois had been designing and building large computers for the U.S. Department of Defense and ARPA since 1949. In 1964 the university signed a contract with ARPA to fund the effort, which became known as ILLIAC IV, since it was the fourth computer designed and created at the university. Development started in 1965, and a first-pass design was completed in 1966.
In contrast to the bit-serial concept of SOLOMON, in ILLIAC IV the PEs were upgraded to be full 64-bit processors, using 12,000 gates and 2048-words of thin-film memory. The PEs had five 64-bit registers, each with a special purpose. One of these, RGR, was used for communicating data to neighbouring PEs, moving one "hop" per clock cycle. Another register, RGD, indicated whether or not that PE was currently active. "Inactive" PEs could not access memory, but they would pass results to neighbouring PEs using the RGR. The PEs were designed to work as a single 64-bit FPU, two 32-bit half-precision FPUs, or eight 8-bit fixed-point processors.
Instead of 1,024 PEs and a single CU, the new design had a total of 256 PEs arranged into four 64-PE "quadrants", each with its own CU. The CU's were also 64-bit designs, with sixty-four 64-bit registers and another four 64-bit accumulators. The system could run as four separate 64-PE machines, two 128-PE machines, or a single 256-PE machine. This allowed the system to work on different problems when the data was too small to demand the entire 256-PE array.
Based on a 25 MHz clock, with all 256-PEs running on a single program, the machine was designed to deliver 1 billion floating point operations per second, or in today's terminology, 1 GFLOPS. This made it much faster than any machine in the world; the contemporary CDC 7600 had a clock cycle of 27.5 nanoseconds, or 36 MIPS, although for a variety of reasons it generally offered performance closer to 10 MIPS.
To support the machine, an extension to the Digital Computer Laboratory buildings was constructed. Then the Center for Advanced Computation was built to house the project, but the computer was moved and the building was repurposed for the astronomy department and the National Center for Supercomputing Applications.
Sample work at the university was primarily aimed at ways to efficiently fill the PEs with data, thus conducting the first "stress test" in computer development. In order to make this as easy as possible, several new computer languages were created; IVTRAN and TRANQUIL were parallelized versions of FORTRAN, and Glypnir was a similar conversion of ALGOL. Generally, these languages provided support for loading arrays of data "across" the PEs to be executed in parallel, and some even supported the unwinding of loops into array operations.

Construction, problems

In early 1966, the university sent out a request for proposals looking for industrial partners interested in building the design. Seventeen responses were received in July, seven responded, and of these three were selected. Several of the responses, including Control Data, attempted to interest them in a vector processor design instead, but as these were already being designed the team was not interested in building another. In August 1966, eight-month contracts were offered to RCA, Burroughs, and UNIVAC to bid on the construction of the machine.
Burroughs eventually won the contract, having teamed up with Texas Instruments. Both offered new technical advances that made their bid the most interesting. Burroughs was offering to build a new and much faster version of thin-film memory which would improve performance. TI was offering to build 64-pin emitter-coupled logic integrated circuits with 20 logic gates each. At the time, most ICs used 16-pin packages and had between four and seven gates. Using TI's ICs would make the system much smaller.
Burroughs also supplied the specialized disk drives, which featured a separate fixed head for every track and could offer speeds up to 500 Mbit/s and stored about 80 MB per 36-inch disk. They would also provide a Burroughs B6500 mainframe to act as a front-end controller, loading data from secondary storage and performing other housekeeping tasks. Connected to the B6500 was a third-party laser optical recording medium, a write-once system that stored up to 1 Tbit on thin metal film coated on a strip of polyester sheet carried by a rotating drum. Construction of the new design began at Burroughs' Great Valley Lab. At the time, it was estimated the machine would be delivered in early 1970.
After a year of working on the ICs, TI announced they had failed to be able to build the 64-pin designs. The more complex internal wiring was causing crosstalk in the circuitry, and they asked for another year to fix the problems. Instead, the ILLIAC team chose to redesign the machine based on available 16-pin ICs. This required the system to run slower, using a 16 MHz clock instead of the original 25 MHz. The change from 64-pin to 16-pin cost the project about two years, and millions of dollars. TI was able to get the 64-pin design working after just over another year, and began offering them on the market before ILLIAC was complete.
As a result of this change, the individual printed circuit boards grew about square to about. This doomed Burroughs' efforts to produce a thin-film memory for the machine, because there was now no longer enough space for the memory to fit within the design's cabinets. Attempts to increase the size of the cabinets to make room for the memory caused serious problems with signal propagation. Slotnick surveyed the potential replacements and picked a semiconductor memory from Fairchild Semiconductor, a decision that was so opposed by Burroughs that a full review by ARPA followed.
In 1969, these problems, combined with the resulting cost overruns from the delays, led to the decision to build only a single 64-PE quadrant, thereby limiting the machine's speed to about 200 MFLOPS. Together, these changes cost the project three years and $6 million. By 1969, the project was spending $1 million a month, and had to be spun out of the original ILLIAC team who were becoming increasingly vocal in their opposition to the project.