List of x86 SIMD instructions


The x86 instruction set has several times been extended with SIMD instruction set extensions. These extensions, starting from the MMX instruction set extension introduced with Pentium MMX in 1997, typically define sets of wide registers and instructions that subdivide these registers into fixed-size lanes and perform a computation for each lane in parallel.

Summary of SIMD extensions

The main SIMD instruction set extensions that have been introduced for x86 are:
SIMD instruction set extensionYearDescriptionAdded in
1997A set of 57 integer SIMD instruction acting on 64-bit vectors, mostly providing 8/16/32-bit lane-width operations.
Repurposed the old x87 FPU register-file as a bank of eight 64-bit vector registers, referred to as MM0..MM7 when used for MMX instructions.

AMD K6,
Intel Pentium II,

Rise mP6,
IDT WinChip C6,
Transmeta Crusoe,
DM&P Vortex86MX
1999"Katmai New Instructions" - introduced a set of 70 new instructions. Most but not all of these instructions provide scalar and vector operations on 32-bit floating-point values in 128-bit SIMD vector registers.
SSE introduced a new set of eight vector registers XMM0..XMM7, each 128 bits, and a status/control register MXCSR.
This set of eight vector registers would later be extended to 16 registers with the introduction of x86-64.
Intel Pentium III,
AMD Athlon XP,
VIA C3 "Nehemiah",
Transmeta Efficeon
2000Extended SSE with 144 new instructions - mainly additional instructions to work on scalars and vectors of 64-bit floating-point values, as well as 128-bit-vector forms of most of the MMX integer instructions.Intel Pentium 4,
Intel Pentium M,
AMD Athlon 64,
Transmeta Efficeon,
VIA C7
2004"Prescott New Instructions": added a set of 13 new instructions, mostly horizontal add/subtract operations.Intel Pentium 4 "Prescott",
Transmeta Efficeon 8800,
AMD Athlon 64 "Venice",
VIA C7,
Intel Core "Yonah"
2006Added a set of 32 new instructions to extend MMX and SSE, including a byte-shuffle instruction.Intel Core 2 "Conroe"/"Merom",
VIA Nano 2000,
Intel Atom "Bonnell",
AMD "Bobcat",
AMD FX "Bulldozer"
2007AMD-only extension that added a set of 4 instructions, including bitfield insert/extract and scalar non-temporal store instructions.AMD K10
2007Added a set of 47 instructions, including variants of integer min/max, widening integer conversions, vector lane insert/extract, and dot-product instructions.Intel Core 2 "Penryn",
VIA Nano 3000,
AMD FX "Bulldozer",
AMD "Jaguar",
Intel Atom "Silvermont",
Zhaoxin ZX-A
2008Added a set of 7 instructions, mostly pertaining to string processing.Intel Core i7 "Nehalem",
AMD FX "Bulldozer",
AMD "Jaguar",
Intel Atom "Silvermont",
VIA Nano QuadCore C4000,
Zhaoxin ZX-C "ZhangJiang"
2011Extended the XMM0..XMM15 vector registers to 256-bit registers, referred to as YMM0..YMM15 when used as full 256-bit registers.
Added three-operand variants of most of the SSE1-4 vector instructions, as well as 256-bit vector variants of most of the SSE1-4 vector instructions acting on 32/64-bit floating-point values. These new instruction variants are all encoded with the new VEX prefix.
Intel Core i7 "Sandy Bridge",
AMD FX "Bulldozer",
AMD "Jaguar",
VIA Nano QuadCore C4000,
Zhaoxin ZX-C "ZhangJiang",
Intel Atom "Gracemont"
2013Added three-operand floating-point fused-multiply add operations, scalar and vector variants.Intel Core i7 "Haswell",
AMD FX "Piledriver",
Intel Atom "Gracemont",
Zhaoxin KH-40000 "YongFeng"
2013Added 256-bit vector variants of most of the MMX/SSE1-4 vector integer instructions. Also adds vector gather instructions.Intel Core i7 "Haswell",
AMD FX "Excavator",
VIA Nano QuadCore C4000,
Intel Atom "Gracemont",
Zhaoxin KH-40000 "YongFeng"
2016Extended the YMM0..YMM15 vector registers to a set of 32 registers, each 512-bits wide - referred to as ZMM0..ZMM31 when used as 512-bit registers. Also added eight opmask registers K0..K7.
Added 512-bit versions of most of the MMX/SSE/AVX vector instructions, as well as a substantial number of additional instructions. These are mostly encoded with the new EVEX prefix
Added the ability to perform per-vector-lane masking of the operation of most of its vector instructions, by using the opmask registers. Also added embedded rounding controls for floating-point instructions and a scalar-to-vector broadcast function for most instructions that can accept memory operands.

2023Added a set of eight new tile registers, referred to as TMM0..TMM7. Each of these tile registers has a size of 8192 bits. Also added a 64-byte tile configuration register TILECFG, and instructions to perform matrix multiplication on the tile registers with various data formats.
2024Reformulation of AVX-512 that includes most of the optional AVX-512 subsets as baseline functionality, and switches feature enumeration from the flag-based scheme of AVX-512 to a version-based scheme. No new instructions are added.Intel Xeon 6 "Granite Rapids"
Adds instructions to convert to/from FP8 datatypes, perform arithmetic on BF16 numbers, saturating conversions from floating-point to integer, IEEE754-compliant min/max, and a few other instructions.

MMX instructions and extended variants thereof

These instructions are, unless otherwise noted, available in the following forms:
  • MMX: 64-bit vectors, operating on mm0..mm7 registers
  • SSE2: 128-bit vectors, operating on xmm0..xmm15 registers
  • AVX: 128-bit vectors, operating on xmm0..xmm15 registers, with a new three-operand encoding enabled by the new VEX prefix.
  • AVX2: 256-bit vectors, operating on ymm0..ymm15 registers
  • AVX-512: 512-bit vectors, operating on zmm0..zmm31 registers. AVX-512 also introduces opmasks, allowing the operation of most instructions to be masked on a per-lane basis by an opmask register. AVX-512 also adds broadcast functionality for many of its instructions - this is used with memory source arguments to replicate a single value to all lanes of a vector calculation. The tables below provide indications of whether opmasks and broadcasts are supported for each instruction, and if so, what lane-widths they are using.
For many of the instruction mnemonics, is used to indicate that the instruction mnemonic exists in forms with and without a leading V - the form with the leading V is used for the VEX/EVEX-prefixed instruction variants introduced by AVX/AVX2/AVX-512, while the form without the leading V is used for legacy MMX/SSE encodings without VEX/EVEX-prefix.

Original Pentium MMX instructions, and SSE2/AVX/AVX-512 extended variants thereof

MMX instructions added with MMX+/SSE/SSE2/SSSE3, and SSE2/AVX/AVX-512 extended variants thereof

SSE instructions and extended variants thereof

Regularly-encoded floating-point SSE/SSE2 instructions, and AVX/AVX-512 extended variants thereof

For the instructions in the below table, the following considerations apply unless otherwise noted:
  • Packed instructions are available at all vector lengths
  • FP32 variants of instructions are introduced as part of SSE. FP64 variants of instructions are introduced as part of SSE2.
  • The AVX-512 variants of the FP32 and FP64 instructions are introduced as part of the AVX512F subset.
  • For AVX-512 variants of the instructions, opmasks and broadcasts are available with a width of 32 bits for FP32 operations and 64 bits for FP64 operations.
From SSE2 onwards, some data movement/bitwise instructions exist in three forms: an integer form, an FP32 form and an FP64 form. Such instructions are functionally identical, however some processors with SSE2 will implement integer, FP32 and FP64 execution units as three different execution clusters, where forwarding of results from one cluster to another may come with performance penalties and where such penalties can be minimzed by choosing instruction forms appropriately.

Integer SSE2/4 instructions with 66h prefix, and AVX/AVX-512 extended variants thereof

These instructions do not have any MMX forms, and do not support any encodings without a prefix.
Most of these instructions have extended variants available in VEX-encoded and EVEX-encoded forms:
  • The VEX-encoded forms are available under AVX/AVX2. Under AVX, they are available only with a vector length of 128 bits - under AVX2, they are also made available with a vector length of 256 bits.
  • The EVEX-encoded forms are available under AVX-512 - the specific AVX-512 subset needed for each instruction is listed along with the instruction.

    Other SSE/2/3/4 SIMD instructions, and AVX/AVX-512 extended variants thereof

SSE SIMD instructions that do not fit into any of the preceding groups. Many of these instructions have AVX/AVX-512 extended forms - unless otherwise indicated these extended forms support 128/256-bit operation under AVX and 128/256/512-bit operation under AVX-512.

AVX/AVX2 instructions, and AVX-512 extended variants thereof

This covers instructions/opcodes that are new to AVX and AVX2.
AVX and AVX2 also include extended VEX-encoded forms of a large number of MMX/SSE instructions - please see tables above.
Some of the AVX/AVX2 instructions also exist in extended EVEX-encoded forms under AVX-512 as well.