Single program, multiple data

In computing, single program, multiple data is a term that has been used to refer to computational models for exploiting parallelism whereby multiple processors cooperate in the execution of a program in order to obtain results faster.
The term SPMD was introduced in 1983 and was used to denote two different computational models:

by Michel Auguin and François Larbey, as a "fork-and-join" and data-parallel approach where the parallel tasks are split-up and run simultaneously in lockstep on multiple SIMD processors with different inputs, and
by Frederica Darema, where "all processes begin executing the same program... but through synchronization directives... self-schedule themselves to execute different instructions and act on different data" and enabling MIMD parallelization of a given program, and is a more general approach than data-parallel and more efficient than the fork-and-join for parallel execution on general purpose multiprocessors.

The SPMD is the most common style of parallel programming and can be considered a subcategory of MIMD in that it refers to MIMD execution of a given program. It is also a prerequisite for research concepts such as active messages and distributed shared memory.

SPMD vs SIMD

In SPMD parallel execution, multiple autonomous processors simultaneously execute the same program at independent points, rather than in the lockstep that SIMD or SIMT imposes on different data. With SPMD, tasks can be executed on general purpose CPUs. In SIMD the same operation is applied on multiple data to manipulate data streams. Another class of processors, GPUs encompass multiple SIMD streams processing. SPMD and SIMD are not mutually exclusive; SPMD parallel execution can include SIMD, or vector, or GPU sub-processing. SPMD has been used for parallel programming of both message passing and shared-memory machine architectures.

Distributed memory

On distributed memory computer architectures, SPMD implementations usually employ message passing programming. A distributed memory computer consists of a collection of interconnected, independent computers, called nodes. For parallel execution, each node starts its own program and communicates with other nodes by sending and receiving messages, calling send/receive routines for that purpose. Other parallelization directives such as Barrier synchronization may also be implemented by messages. The messages can be sent by a number of communication mechanisms, such as TCP/IP over Ethernet, or specialized high-speed interconnects such as InfiniBand or Omni-Path. For distributed memory environments, serial sections of the program can be implemented by identical computation of the serial section on all nodes rather than computing the result on one node and sending it to the others, if that improves performance by reducing communication overhead.
Nowadays, the programmer is isolated from the details of the message passing by standard interfaces, such as PVM and MPI.
Distributed memory is the programming style used on parallel supercomputers from homegrown Beowulf clusters to the largest clusters on the Teragrid, as well as present GPU-based supercomputers.

Shared memory

On a shared memory machine, the sharing can be implemented in the context of either physically shared memory or logically shared memory; in addition to the shared memory, the CPUs in the computer system can also include local memory. For either of these contexts, synchronization can be enabled with hardware enabled primitives by depositing the sharable data in a shared memory area. When the hardware does not support shared memory, packing the data as a "message" is often the most efficient way to program shared memory computers with large number of processors, where the physical memory is local to processors and accessing the memory of another processor takes longer. SPMD on a shared memory machine can be implemented by standard processes or threads.
Shared memory multiprocessing presents the programmer with a common memory space and the possibility to parallelize execution. With the SPMD model the cooperating processors take different paths through the program, using parallel directives, and perform operations on data in the shared memory ; the processors can also have access and perform operations on data in their local memory. In contrast, with fork-and-join approaches, the program starts executing on one processor and the execution splits in a parallel region, which is started when parallel directives are encountered; in a parallel region, the processors execute a parallel task on different data. A typical example is the parallel DO loop, where different processors work on separate parts of the arrays involved in the loop. At the end of the loop, execution is synchronized, and processors continue to the next available section of the program to execute. The SPMD has been implemented in the current standard interface for shared memory multiprocessing, OpenMP, which uses multithreading, usually implemented by lightweight processes, called threads.

Combination of levels of parallelism

Current computers allow exploiting many parallel modes at the same time for maximum combined effect. A distributed memory program using MPI may run on a collection of nodes. Each node may be a shared memory computer and execute in parallel on multiple CPUs using OpenMP. Within each CPU, SIMD vector instructions and superscalar instruction execution, such as pipelining and the use of multiple parallel functional units, are used for maximum single CPU speed.

History

The acronym SPMD for "Single-Program Multiple-Data" has been used to describe two different computational models for exploiting parallel computing, and this is due to both terms being natural extensions of Flynn's taxonomy. The two respective groups of researchers were unaware of each other's use of the term SPMD to independently describe different models of parallel programming.
The term SPMD was proposed first in 1983 by Michel Auguin and François Larbey in the context of the OPSILA parallel computer and in the context of a fork-and-join and data parallel computational model approach. This computer consisted of a master and SIMD processors. In Auguin's SPMD model, the same task is executed on different processors of the data-vector. Specifically, their 1985 paper and others stated:

We consider the SPMD operating mode. This mode allows simultaneous execution of the same task but prevents data exchange between processors. Data exchanges are only performed under SIMD mode by means of vector assignments. We assume synchronizations are summed-up to switchings between SIMD and SPMD operatings modes using global fork-join primitives.

Starting around the same timeframe, the SPMD term was proposed by Frederica Darema to define a different SPMD computational model that she proposed, as a programming model which in the intervening years has been applied to a wide range of general-purpose high-performance computers and has led to the current parallel computing standards. The SPMD programming model assumes a multiplicity of processors which operate cooperatively, all executing the same program but can take different paths through the program based on parallelization directives embedded in the program:

All processes participating in the parallel computation are created at the beginning of the execution and remain in existence until the end... execute different instructions and act on different data... the job to be done by each process is allocated dynamically... self-schedule themselves to execute different instructions and act on different data

The notion process generalized the term processor in the sense that multiple processes can execute on a processor. The SPMD model was proposed by Darema as an approach different and more efficient than the fork-and-join that was pursued by all others in the community at that time; it is also more general than just "data-parallel" computational model and can encompass fork-and-join. The original context of the SPMD was the RP3 computer, which supported general purpose computing, with both distributed and shared memory. The SPMD model was implemented by Darema and IBM colleagues into the EPEX, one of the first prototype programming environments. The effectiveness of the SPMD was demonstrated for a wide class of applications, and was implemented in the IBM FORTRAN in 1988, the first vendor-product in parallel programming; and in MPI, OpenMP, and other environments which have adopted and cite the SPMD Computational Model.
By the late 1980s, there were many distributed computers with proprietary message passing libraries. The first SPMD standard was PVM. The current de facto standard is MPI.
The Cray parallel directives were a direct predecessor of OpenMP.