IBM Blue Gene
Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS range, with relatively low power consumption.
The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. During their deployment, Blue Gene systems often led the TOP500 and Green500 rankings of the most powerful and most power-efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the Graph500 list. The project was awarded the 2009 National Medal of Technology and Innovation.
After Blue Gene/Q, IBM focused its supercomputer efforts on the OpenPower platform, using accelerators such as FPGAs and GPUs to address the diminishing returns of Moore's law.
History
A video presentation of the history and technology of the Blue Gene project was given at the Supercomputing 2020 conference.In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding. The research and development was pursued by a large multi-disciplinary team at the IBM T. J. Watson Research Center, initially led by William R. Pulleyblank.
The project had two main goals: to advance understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures.
The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. In parallel, Alan Gara had started working on an extension of the QCDOC architecture into a more general-purpose supercomputer. The US Department of Energy started funding the development of this system and it became known as Blue Gene/L. Development of the original Blue Gene architecture continued under the name Blue Gene/C and, later, Cyclops64.
Architecture and chip logic design for the Blue Gene systems was done at the IBM T. J. Watson Research Center, chip design was completed and chips were manufactured by IBM Microelectronics, and the systems were built at IBM Rochester, MN. Alan Gara was the Chief Architect and Paul Coteus was the Chief Engineer.
In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a LINPACK benchmarks performance of 70.72 TFLOPS. It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark.
While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. The November 2006 TOP500 list showed 27 computers with the eServer Blue Gene Solution architecture. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.
While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real-world application, namely a three-dimensional molecular dynamics code, simulating solidification of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.
In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application. At Supercomputing 2006, Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards. In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second.
The name
The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of protein folding and gene development. "Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually Cyclops64. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a petascale design. "Q" is just the letter after "P".Major features
The Blue Gene/L supercomputer was unique in the following aspects:- Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating-point accelerators. While the performance of each chip was relatively low, the system could achieve better power efficiency for applications that could use large numbers of nodes.
- Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.
- System-on-a-chip design. Components were embedded on a single chip for each node, with the exception of 512 MB external DRAM.
- A large number of nodes.
- Three-dimensional torus interconnect with auxiliary networks for global communications, I/O, and management.
- Lightweight OS per node for minimum system overhead.
Architecture
Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. A cabinet/rack contained 32 node boards. By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts. The low power per node allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard 19-inch rack, within reasonable limits on electrical power supply and air cooling. The system performance metrics, in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack, to allow the machine to continue to run.
Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication, and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. A separate and private Ethernet management network provided access to any node for configuration, booting and diagnostics.
To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.
Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on a node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby and Python have been ported to the compute nodes.
IBM published BlueMatter, the application developed to exercise Blue Gene/L, as open source. This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.
Blue Gene/P
In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's .Design
The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850 MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor. The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards.By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007–2008.