Qualcomm Hexagon


Hexagon is the brand name for a family of digital signal processor and later neural processing unit products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” According to Qualcomm, the Hexagon architecture is designed to deliver performance with low power over a variety of applications.
Each version of Hexagon has an instruction set and a micro-architecture. These two features are intimately related.
Hexagon is used in Qualcomm Snapdragon chips, for example in smartphones, cars, wearable devices and other mobile devices and is also used in components of cellular phone networks.

Instruction set architecture

Computing devices have instruction sets, which are their lowest, most primitive languages. Common instructions are those which cause two numbers to be added, multiplied or combined in other ways, as well as instructions that direct the processor where to look in memory for its next instruction. There are many other types of instructions.
Assemblers and compilers that translate computer programs into streams of instructions – bit streams - that the device can understand and carry out. As an instruction stream executes, the integrity of system function is supported by the use of instruction privilege levels. Privileged instructions have access to more resources in the device, including memory. Hexagon supports privilege levels.
Originally, Hexagon instructions operated on integer numbers but not floating point numbers, but in v5 floating point support was added.
The processing unit which handles execution of instructions is capable of in-order dispatching up to 4 instructions to 4 execution units every clock.

Micro-architecture

Micro-architecture is the physical structure of a chip or chip component that makes it possible for a device to carry out the instructions. A given instruction set can be implemented by a variety of micro-architectures. The buses – data transfer channels – for Hexagon devices are 32 bits wide. That is, 32 bits of data can be moved from one part of the chip to another in a single step. The Hexagon micro-architecture is multi-threaded, which means that it can simultaneously process more than one stream of instructions, enhancing data processing speed. Hexagon supports very long instruction words, which are groupings of four instructions that can be executed “in parallel.” Parallel execution means that multiple instructions can run simultaneously without one instruction having to complete before the next one starts. The Hexagon micro-architecture supports single instruction, multiple data operations, which means that when a Hexagon device receives an instruction, it can carry out the operation on more than one piece of data at the same time.
According to 2012 estimation, Qualcomm shipped 1.2 billion DSP cores inside its system on a chip (SoCs) in 2011, and 1.5 billion cores were planned for 2012, making the QDSP6 the most shipped architecture of DSP.
The Hexagon architecture is designed to deliver performance with low power over a variety of applications. It has features such as hardware assisted multithreading, privilege levels, very long instruction word (VLIW), single instruction, multiple data (SIMD), and instructions geared toward efficient signal processing. Hardware multithreading is implemented as barrel temporal multithreading - threads are switched in round-robin fashion each cycle, so the 600 MHz physical core is presented as three logical 200 MHz cores before V5. Hexagon V5 switched to dynamic multithreading with thread switch on L2 misses, interrupt waiting or on special instructions.
At Hot Chips 2013 Qualcomm announced details of their Hexagon 680 DSP. Qualcomm announced Hexagon Vector Extensions. HVX is designed to allow significant compute workloads for advanced imaging and computer vision to be processed on the DSP instead of the CPU. In March 2015 Qualcomm announced their Snapdragon Neural Processing Engine SDK which allow AI acceleration using the CPU, GPU and Hexagon DSP.
Qualcomm's Snapdragon 855 contains their 4th generation on-device AI engine, which includes the Hexagon 690 DSP and Hexagon Tensor Accelerator for AI acceleration. Snapdragon 865 contains the 5th generation on-device AI engine based on the Hexagon 698 DSP capable of 15 trillion operations per second. Snapdragon 888 contains the 6th generation on-device AI engine based on the Hexagon 780 DSP capable of 26 TOPS. Snapdragon 8 contains the 7th generation on-device AI engine based on the Hexagon DSP capable of 52 TOPS and up to 104 TOPS in some cases.

Software support

Operating systems

The port of Linux for Hexagon runs under a hypervisor layer and was merged with the 3.2 release of the kernel. The original hypervisor is closed-source, and in April 2013 a minimal open-source hypervisor implementation for QDSP6 V2 and V3, the "Hexagon MiniVM" was released by Qualcomm under a BSD-style license.

Compilers

Support for Hexagon was added in 3.1 release of LLVM by Tony Linthicum. Hexagon/HVX V66 ISA support was added in 8.0.0 release of LLVM. There is also a non-FSF maintained branch of GCC and binutils.

Adoption of the SIP block

Qualcomm Hexagon DSPs have been available in Qualcomm Snapdragon SoC since 2006. In Snapdragon S4 there are three QDSP cores, two in the Modem subsystem and one Hexagon core in the Multimedia subsystem. Modem cores are programmed by Qualcomm only, and only Multimedia core is allowed to be programmed by user.
They are also used in some femtocell processors of Qualcomm, including FSM98xx, FSM99xx and FSM90xx.

Third-party integration

In March 2016, it was announced that semiconductor company Conexant's AudioSmart audio processing software was being integrated into Qualcomm's Hexagon.
In May 2018 wolfSSL added support for using Qualcomm Hexagon. This is support for running wolfSSL crypto operations on the DSP. In addition to use of crypto operations a specialized operation load management library was later added.

Versions

There are six versions of QDSP6 architecture released: V1, V2, V3, V4, QDSP6 V5, and QDSP6 V6 V4 has 20 DMIPS per milliwatt, operating at 500 MHz.
Clock speed of Hexagon varies in 400–2000 MHz for QDSP6 and in 256–350 MHz for previous generation of the architecture, the QDSP5.
Versions of QDSP6
Process
node, nm
YearNumber of
simultaneous
threads
Per-thread
clock, MHz
Total core
clock, MHz
Product
QDSP6 V1652006
QDSP6 V26520076100600
QDSP6 V3 452009667400
QDSP6 V3 4520094100400
QDSP6 V4
2820103167500Snapdragon
600
QDSP6 V5
2820133200 or greater
with DMT
600Snapdragon
410/412/800/801
53612/282014205/208/210/212
Snapdragon
425/427/429/430/435/439
V50 282014700/800 Snapdragon
415/610/615/616/805
54614/282015Snapdragon
450/617/625/626/632
V56 20/282015800 Snapdragon
650/652/653/808/810
642142017Snapdragon
630
680 1420164500787, 2000
Snapdragon
636/660/820/821
682102017Snapdragon
835
683112020Snapdragon
460/662
68510/112018Snapdragon
670/675/678/710/712/845/850
6866/8/112019Snapdragon
480/480+/665/680/685/695
68882019Snapdragon
730/730G/732G
69072019Snapdragon
855/855+/860/8c/8cx/8cx Gen 2
Microsoft SQ1/SQ2
69282020Snapdragon
690/720G/7c/7c Gen 2
69482020Snapdragon
750G
69672020Snapdragon
765/765G/768G
69872020Snapdragon
865/865+/870/8cx Gen 3
Microsoft SQ3
7705/62021Snapdragon
778G/778G+/780G/782G
78052021Snapdragon
888/888+
790 42021
Snapdragon
8 Gen 1/8+ Gen 1
NPU 42022
Snapdragon
8 Gen 2
NPU 42023
Snapdragon
8 Gen 3/8s Gen 3
NPU 32024
Snapdragon
8 Elite
NPU 42024
Snapdragon
X

Availability in Snapdragon products

Both Hexagon and pre-Hexagon cores are used in modern Qualcomm SoCs, QDSP5 mostly in low-end products. Modem QDSPs are not shown in the table.
QDSP5 usage:
Snapdragon generationChipset IDDSP generationDSP frequency, MHzProcess node, nm
S1MSM7627, MSM7227, MSM7625, MSM7225QDSP532065
S1MSM7627A, MSM7227A, MSM7625A, MSM7225AQDSP535045
S2MSM8655, MSM8255, APQ8055, MSM7630, MSM7230QDSP525645
S4 PlayMSM8625, MSM8225QDSP535045
S2008110, 8210, 8610, 8112, 8212, 8612, 8225Q, 8625QQDSP538445 LP

QDSP6 usage:
Snapdragon generationChipset IDQDSP6 versionDSP frequency, MHzProcess node, nm
S1QSD8650, QSD8250QDSP660065
S3MSM8660, MSM8260, APQ8060QDSP6 40045
S4 PrimeMPQ8064QDSP6 50028
S4 ProMSM8960 Pro, APQ8064QDSP6 50028
S4 PlusMSM8960, MSM8660A, MSM8260A, APQ8060A, MSM8930,
MSM8630, MSM8230, APQ8030, MSM8627, MSM8227
QDSP6 50028
S4008926, 8930, 8230, 8630, 8930AB, 8230AB, 8630AB, 8030AB, 8226, 8626QDSP6V450028 LP
S6008064T, 8064MQDSP6V450028 LP
S8008974, 8274, 8674, 8074QDSP6V5A60028 HPm
S8208996QDSP6V6200014 FinFET LPP

Hardware codec supported

The different video codecs supported by the Snapdragon SoCs.
D - decode; E - encode
FHD = FullHD = 1080p = 1920x1080px
HD = 720p which can be 1366x768px or 1280x720px

Snapdragon 200 series

The different video codecs supported by the Snapdragon 200 series.
CodecSnapdragon
200
Snapdragon
200
Qualcomm
205
Snapdragon
208/210
Snapdragon
212
Availability20132013201720142015
HexagonQDSP5QDSP6536536536
H263
VC-1
H.264
H.264 10-bit
VP8
H.265
H.265 10-bit
H.265 12-bit
VVC
VP9
VP9 10-bit
AV1

Snapdragon 400 series

The different video codecs supported by the Snapdragon 400 series.
CodecSnapdragon
400
Snapdragon
410/415
Snapdragon
425/427
Snapdragon
429/439
Snapdragon
450
Snapdragon
460
Snapdragon
480/480+
AvailabilityQ4 20132014/2015Q1 2016/Q3 2017Q2 2018Q2 2017Q1 2020Q1 2021
HexagonQDSP6QDSP6 V5536536546683686
H263
VC-1
H.264
H.264 10-bit
VP8
H.265
H.265 10-bit
H.265 12-bit
VVC
VP9
VP9 10-bit
AV1
Video frame rate support, decodingHD 60 fps
Video frame rate support, decodingFHD 60 fpsFHD 60 fpsFHD 60 fps
Video frame rate support, encodingHD 60 fps
Video frame rate support, encodingFHD 60 fpsFHD 60 fpsFHD 60 fps

Snapdragon 600 series

The different video codecs supported by the Snapdragon 600 series.

Snapdragon 700 series

The different video codecs supported by the Snapdragon 700 series.

Snapdragon 800 series

The different video codecs supported by the Snapdragon 800 series.

Code sample

This is a single instruction packet from the inner loop of a FFT:
:endloop0

This packet is claimed by Qualcomm to be equal to 29 classic RISC operations; it includes vector add, complex multiply operation and hardware loop support. All instructions of the packet are done in the same cycle.