AArch64
AArch64, also known as ARM64, is a 64-bit version of the ARM architecture family, a widely used set of computer processor designs. It was introduced in 2011 with the ARMv8 architecture and later became part of the ARMv9 series. AArch64 allows processors to handle more memory and perform faster calculations than earlier 32-bit versions. It is designed to work alongside the older 32-bit mode, known as AArch32, allowing compatibility with a wide range of software. Devices that use AArch64 include smartphones, tablets, personal computers, and servers. The AArch64 architecture has continued to evolve through updates that improve performance, security, and support for advanced computing tasks.
AArch64 Execution state
In ARMv8-A, ARMv8-R, and ARMv9-A, an "Execution state" defines key characteristics of the processor’s environment. This includes the number of bits used in the primary processor registers, the supported instruction sets, and other aspects of the processor's execution environment. These versions of the ARM architecture support two Execution states: the 64-bit AArch64 state and the 32-bit AArch32 state.Naming conventions
- 64-bit:
- * Execution state: AArch64
- * Instruction sets: A64
- 32-bit:
- * Execution state: AArch32
- * Instruction sets: A32 + T32
- * Example: ARMv8-R, Cortex-A32
AArch64 features
- New instruction set, A64:
- * Has 31 general-purpose 64-bit registers
- * Has dedicated zero or stack pointer register
- * The program counter is no longer directly accessible as a register
- * Instructions are still 32 bits long and mostly the same as A32
- ** Has paired loads/stores
- ** No predication for most instructions
- * Most instructions can take 32-bit or 64-bit arguments
- * Addresses assumed to be 64-bit
- Advanced SIMD enhanced:
- * Has 32 × 128-bit registers, also accessible via VFPv4
- * Supports double-precision floating-point format
- * Fully IEEE 754 compliant
- * AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers
- A new exception system:
- * Fewer banked registers and modes
- Memory translation from 48-bit virtual addresses based on the existing Large Physical Address Extension, which was designed to be easily extended to 64-bit
AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A, and in all versions of ARMv9-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.
A64 instruction formats
The main opcode for selecting which group an A64 instruction belongs to is at bits 25–28.ARM-A (application architecture)
Announced in October 2011, ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit Execution state, named "AArch64", and the associated new "A64" instruction set, in addition to a 32-bit Execution state, "AArch32", supporting the 32-bit "A32" and "T32" instruction sets. The latter instruction sets provide user-space compatibility with the existing 32-bit ARMv7-A architecture. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit hypervisor. ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012. Apple was the first to release an ARMv8-A compatible core in a consumer product. AppliedMicro, using an FPGA, was the first to demo ARMv8-A. The first ARMv8-A SoC from Samsung is the Exynos 5433 used in the Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration; but it only runs in AArch32 mode.ARMv8-A includes VFPv3/v4 and advanced SIMD as standard features in both AArch32 and AArch64. It also adds cryptography instructions supporting AES, SHA-1/SHA-256 and finite field arithmetic.
An ARMv8-A processor can support one or both of AArch32 and AArch64; it may support AArch32 and AArch64 at lower Exception levels and only AArch64 at higher Exception levels. For example, the ARM Cortex-A32 supports only AArch32, the ARM Cortex-A34 supports only AArch64, and the ARM Cortex-A72 supports both AArch64 and AArch32. An ARMv9-A processor must support AArch64 at all Exception levels, and may support AArch32 at EL0.
ARMv8.1-A
In December 2014, ARMv8.1-A, an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation.Instruction set enhancements included the following:
- A set of AArch64 atomic read-write instructions.
- Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations:
- * Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half.
- * Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half.
- * The instructions are added in vector and scalar forms.
- A set of AArch64 load and store instructions that can provide a memory access order that is limited to configurable address regions.
- The optional CRC instructions in v8.0 become a requirement in ARMv8.1.
- A new Privileged Access Never state bit provides control that prevents privileged access to user data unless explicitly enabled.
- An increased VMID range for virtualization; supports a larger number of virtual machines.
- Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism.
- The Virtualization Host Extensions. These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
- A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS.
- Top byte ignore for memory tagging.
ARMv8.2-A
- Optional half-precision floating-point data processing
- Memory model enhancements.
- Introduction of Reliability, Availability and Serviceability Extension.
- Introduction of statistical profiling.
Scalable Vector Extension (SVE)
A 512-bit SVE variant has already been implemented on the Fugaku supercomputer using the Fujitsu A64FX ARM processor; this computer was the fastest supercomputer in the world for two years, from June 2020 to May 2022. A more flexible version, 2x256 SVE, was implemented by the AWS Graviton3 ARM processor.
SVE is supported by GCC, with GCC 8 supporting automatic vectorization and GCC 10 supporting C intrinsics., LLVM and clang support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization.
ARMv8.3-A
In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories:- Pointer authentication ; mandatory extension to the architecture.
- Nested virtualization.
- Advanced SIMD complex number support ; e.g. rotations by multiples of 90 degrees.
- New FJCVTZS instruction.
- A change to the memory consistency model to support the weaker RCpc model of C++11/C11.
- ID mechanism support for larger system-visible caches.
ARMv8.4-A
In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories:- "SHA3 / SHA512 / SM3 / SM4 crypto extensions", i.e. optional instructions.
- Improved virtualization support.
- Memory Partitioning and Monitoring capabilities.
- A new Secure EL2 state and Activity Monitors.
- Signed and unsigned integer dot product instructions.
ARMv8.5-A and ARMv9.0-A
- Memory Tagging Extension .
- Branch Target Indicators to reduce "the ability of an attacker to execute arbitrary code". Like pointer authentication, the relevant instructions are no-ops on earlier versions of ARMv8-A.
- Random Number Generator instructions – "providing Deterministic and True Random Numbers conforming to various National and International Standards".
In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all features from ARMv8.5. ARMv9-A also adds:
- Scalable Vector Extension 2. SVE2 builds on SVE's scalable vectorization for increased fine-grain Data Level Parallelism to allow more work done per instruction. SVE2 aims are stated in marketing material to bring these benefits to a wider range of software including DSP and multimedia SIMD code that currently use NEON. LLVM/Clang 9.0 and GCC 10.0 were updated to support SVE2.
- Transactional Memory Extension. Following the x86 extensions, TME brings support for Hardware Transactional Memory and Transactional Lock Elision. TME aims to bring scalable concurrency to increase coarse-grained Thread Level Parallelism, to allow more work done per thread.LLVM/Clang 9.0 and GCC 10.0 were updated to support TME.
- Confidential Compute Architecture.
ARMv8.6-A and ARMv9.1-A
- Bfloat16 format support.
- SIMD matrix manipulation instructions :
- * BFDOT*
- * BFMMLA
- * BFMLAL*
- * BFCVT*
- Enhancements for virtualization, system management and security.
- And the following extensions :
- * Enhanced Counter Virtualization.
- * Fine-Grained Traps.
- * Activity Monitors virtualization.