ARM Cortex-A77
The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. Released in 2019, ARM claimed an increase of 23% and 35% in integer and floating point performance and 15% higher memory bandwidth over its predecessor, the A76.
Design
The Cortex-A77 serves as the successor of the Cortex-A76. The Cortex-A77 is a 4-wide decode out-of-order superscalar design with a new 1.5K macro-OP cache. It can fetch 4 instructions and 6 Mops per cycle. And rename and dispatch 6 Mops, and 13 μops per cycle. The out-of-order window size has been increased to 160 entries. The backend is 12 execution ports with a 50% increase over Cortex-A76. It has a pipeline depth of 13 stages and the execution latencies of 10 stages.There are six pipelines in the integer cluster – an increase of two additional integer pipelines from Cortex-A76. One of the changes from Cortex-A76 is the unification of the issue queues. Previously each pipeline had its own issue queue. On Cortex-A77, there is now a single unified issue queue which improves efficiency. Cortex-A77 added a new fourth general math ALU with a typical 1-cycle simple math operations and some 2-cycle more complex operations. In total, there are three simple ALUs that perform arithmetic and logical data processing operations and a fourth port which has support for complex arithmetic. Cortex-A77 also added a second branch ALU, doubling the throughput for branches.
There are two ASIMD/FP execution pipelines. This is unchanged from Cortex-A76. What did change is the issue queues. As with the integer cluster, the ASIMD cluster now features a unified issue queue for both pipelines, improving efficiency. As with Cortex-A76, the ASIMD on Cortex-A77 are both 128-bit wide capable of 2 double-precision operations, 4 single-precision, 8 half-precision, or 16 8-bit integer operations. Those pipelines can also execute the cryptographic instructions if the extension is supported. Cortex-A77 added a second AES unit in order to improve the throughput of cryptography operations.
Larger ROB, Up to 160-entry, up from 128, Add New L0 MOP cache, can up to 1536-entry.
The core supports unprivileged 32-bit applications, but privileged applications must utilize the 64-bit ARMv8-A ISA. It also supports Load acquire instructions, Dot Product instructions, and PSTATE Speculative Store Bypass Safe bit instructions.
The Cortex-A77 supports ARM's DynamIQ technology, and is expected to be used as high-performance cores in combination with Cortex-A55 power-efficient cores.
Architecture changes in comparison with [ARM Cortex-A76]
- Front-end
- * Branch-prediction
- ** Better accuracy
- ** Up to 64B runahead window
- ** Increase L1 BRB capacity, up to 64-entry
- ** Increase BTB capacity, up to 8K-entry
- * Improved prefetcher
- * Add new L0 Macro-op cache
- * Wider instruction fetch, up to 6 instructions/cycle
- Execution engine
- * Wider instruction fetch, Up to 6 instructions/cycle
- * Larger Re-Order Buffer, Up to 160-entry
- * Wider dispatch, up to 10-way,
- * Wider issue, up to 12-way
- ** Execution units
- *** New integer ALU unit and port
- *** New branch unit and port
- *** New dedicated store data ports
- *** New AES unit added
Licensing
Usage
The Samsung Exynos 980 was introduced in September 2019 as the first SoC to use the Cortex-A77 microarchitecture. This was later followed by a lower-end variant Exynos 880 in May 2020. The MediaTek Dimensity 1000, 1000L and 1000+ SoCs also utilizes the Cortex-A77 microarchitecture. Derivatives by the names of Kryo 585, Kryo 570 and Kryo 560, are used in the Snapdragon 865, 750G, and 690 respectively. HiSilicon uses the Cortex-A77 at two different frequencies in their Kirin 9000 series.Both its predecessor and its successor had automotive variants with Split-Lock capability, the Cortex-A76AE and Cortex-A78AE, but the Cortex-A77 did not, thus not finding its way into security critical applications.