ARM Cortex-M
The ARM Cortex-M is a group of 32-bit RISC ARM processor cores licensed by ARM Limited. These cores are optimized for low-cost and energy-efficient integrated circuits, which have been embedded in tens of billions of consumer devices. Though they are most often the main component of microcontroller chips, sometimes they are embedded inside other types of chips too. The Cortex-M family consists of Cortex-M0, Cortex-M0+, Cortex-M1, Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23, Cortex-M33, Cortex-M35P, Cortex-M52, Cortex-M55, Cortex-M85. A floating-point unit option is available for Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 cores, and when included in the silicon these cores are sometimes known as "Cortex-MxF", where 'x' is the core variant.
Overview
The ARM Cortex-M family are ARM microprocessor cores that are designed for use in microcontrollers, ASICs, ASSPs, FPGAs, and SoCs. Cortex-M cores are commonly used as dedicated microcontroller chips, but also are "hidden" inside of SoC chips as power management controllers, I/O controllers, system controllers, touch screen controllers, smart battery controllers, and sensor controllers.The main difference from Cortex-A cores is that Cortex-M cores have no memory management unit for virtual memory, considered essential for "full-fledged" operating systems. Cortex-M programs instead run bare metal or on one of the many real-time operating systems which support a Cortex-M.
Though 8-bit microcontrollers were very popular in the past, Cortex-M has slowly been chipping away at the 8-bit market as the prices of low-end Cortex-M chips have moved downward. Cortex-M have become a popular replacements for 8-bit chips in applications that benefit from 32-bit math operations, and replacing older legacy ARM cores such as ARM7 and ARM9.
In particular, the embedded wear-leveling controller inside most SD cards or flash drives is a 8051 microcontroller or ARM CPU.
License
neither manufactures nor sells CPU devices based on its own designs, but rather licenses the processor architecture to interested parties. Arm offers a variety of licensing terms, varying in cost and deliverables. To all licensees, Arm provides an integratable hardware description of the ARM core, as well as complete software development toolset and the right to sell manufactured silicon containing the ARM CPU.Silicon customization
Integrated Device Manufacturers receive the ARM Processor IP as synthesizable RTL. In this form, they have the ability to perform architectural level optimizations and extensions. This allows the manufacturer to achieve custom design goals, such as higher clock speed, very low power consumption, instruction set extensions, optimizations for size, debug support, etc. To determine which components have been included in a particular ARM CPU chip, consult the manufacturer datasheet and related documentation.Some of the silicon options for the Cortex-M cores are:
- SysTick timer: A 24-bit system timer that extends the functionality of both the processor and the Nested Vectored Interrupt Controller. When present, it also provides an additional configurable priority SysTick interrupt. Though the SysTick timer is optional for the M0/M0+/M1/M23, it is extremely rare to find a Cortex-M microcontroller without it. If a Cortex-M33/M35P/M52/M55/M85 microcontroller has the Security Extension option, then it optionally can have two SysTicks.
- Bit-Band: Maps a complete word of memory onto a single bit in the bit-band region. For example, writing to an alias word will set or clear the corresponding bit in the bit-band region. This allows every individual bit in the bit-band region to be directly accessible from a word-aligned address. In particular, individual bits can be set, cleared, or toggled from C/C++ without performing a read-modify-write sequence of instructions. Though the bit-band is optional, it is less common to find a Cortex-M3 and Cortex-M4 microcontroller without it. Some Cortex-M0 and Cortex-M0+ microcontrollers have bit-band.
- Memory Protection Unit : Provides support for protecting regions of memory through enforcing privilege and access rules. It supports up to sixteen different regions, each of which can be split further into equal-size sub-regions.
- Tightly-Coupled Memory : Low-latency SRAM that can be used to hold the call stack, RTOS control structures, interrupt data structures, interrupt handler code, and speed critical code. Other than CPU cache, TCM is the fastest memory in an ARM Cortex-M microcontroller. Since TCM isn't cached and accessible at the same speed as the processor and cache, it could be conceptually described as "addressable cache". There is an ITCM and a DTCM to allow a Harvard architecture processor to read from both simultaneously. The DTCM can't contain any instructions, but the ITCM can contain data. Since TCM is tightly connected to the processor core, DMA engines might not be able to access TCM on some implementations.
| ARM Core | Cortex M0 | Cortex M0+ | Cortex M1 | Cortex M3 | Cortex M4 | Cortex M7 | Cortex M23 | Cortex M33 | Cortex M35P | Cortex M52 | Cortex M55 | Cortex M85 |
| SysTick 24-bit Timer | Optional | Optional | Optional | Optional | ||||||||
| Single-cycle I/O port | Optional | Optional | ||||||||||
| Bit-Band memory | * | Optional | Optional | Optional | ||||||||
| Memory Protection Unit | Optional | Optional | Optional | Optional | Optional | Optional | Optional * | Optional | Optional | Optional | ||
| Security Attribution Unit and Stack Limits | Optional | Optional | Optional * | Optional | Optional | Optional | ||||||
| Instruction Cache | Optional | Optional | Optional | Optional | Optional | |||||||
| Data Cache | Optional | Optional | Optional | Optional | ||||||||
| Instruction TCM Memory | Optional | Optional | Optional | Optional | Optional | |||||||
| Data TCM Memory | Optional | Optional | Optional | Optional | Optional | |||||||
| ECC for TCM and Cache | Optional | Optional | Optional | Optional | ||||||||
| Vector Table Offset Register | Optional | Optional | Optional | Optional | Optional | Optional |
- Note: Most Cortex-M3 and M4 chips have bit-band and MPU. The bit-band option can be added to the M0/M0+ using the Cortex-M System Design Kit.
- Note: Software should validate the existence of each feature before attempting to use it.
- Note: Limited public information is available for the Cortex-M35P until its Technical Reference Manual is released.
- Data endianness: Little-endian or big-endian. Unlike legacy ARM cores, the Cortex-M is permanently fixed in silicon as one of these choices.
- Interrupts: 1 to 32, 1 to 240, 1 to 480.
- Wake-up interrupt controller: Optional.
- Vector Table Offset Register: Optional..
- Instruction fetch width: 16-bit only, or mostly 32-bit.
- User/privilege support: Optional.
- Reset all registers: Optional.
- Single-cycle I/O port: Optional..
- Debug Access Port : None, SWD, JTAG and SWD.
- Halting debug support: Optional.
- Number of watchpoint comparators: 0 to 2, 0 to 4.
- Number of breakpoint comparators: 0 to 4, 0 to 8.
Instruction sets
All Cortex-M cores implement a common subset of instructions that consists of most Thumb-1, some Thumb-2, including a 32-bit result multiply. The Cortex-M0 / Cortex-M0+ / Cortex-M1 / Cortex-M23 were designed to create the smallest silicon die, thus having the fewest instructions of the Cortex-M family.
The Cortex-M0 / M0+ / M1 include Thumb-1 instructions, except new instructions which were added in ARMv7-M architecture. The Cortex-M0 / M0+ / M1 include a minor subset of Thumb-2 instructions. The Cortex-M3 / M4 / M7 / M33 / M35P have all base Thumb-1 and Thumb-2 instructions. The Cortex-M3 adds three Thumb-1 instructions, all Thumb-2 instructions, hardware integer divide, and saturation arithmetic instructions. The Cortex-M4 adds DSP instructions and an optional single-precision floating-point unit. The Cortex-M7 adds an optional double-precision FPU. The Cortex-M23 / M33 / M35P / M52 / M55 / M85 add TrustZone instructions.
| Arm Core | Cortex M0 | Cortex M0+ | Cortex M1 | Cortex M3 | Cortex M4 | Cortex M7 | Cortex M23 | Cortex M33 | Cortex M35P | Cortex M52 | Cortex M55 | Cortex M85 |
| ARM architecture | ARMv6-M | ARMv6-M | ARMv6-M | ARMv7-M | ARMv7E-M | ARMv7E-M | ARMv8-M Baseline | ARMv8-M Mainline | ARMv8-M Mainline | Armv8.1-M Mainline | Armv8.1-M Mainline | Armv8.1-M Mainline |
| Computer architecture | Von Neumann | Von Neumann | Von Neumann | Harvard | Harvard | Harvard | Von Neumann | Harvard | Harvard | Harvard | Harvard | Harvard |
| Instruction pipeline | 3 stages | 2 stages | 3 stages | 3 stages | 3 stages | 6 stages | 2 stages | 3 stages | 3 stages | 4 stages | 4-5 stages | 7 stages |
| Interrupt latency | 16 cycles | 15 cycles | 23 for NMI, 26 for IRQ | 12 cycles | 12 cycles | 12 cycles, 14 worst case | 15 cycles, 24 secure to NS IRQ | 12 cycles, 21 secure to NS IRQ | TBD | TBD | TBD | TBD |
| Thumb-1 instructions | Most | Most | Most | Most | ||||||||
| Thumb-2 instructions | Some | Some | Some | Some | ||||||||
| Multiply instructions 32×32 = 32-bit result | ||||||||||||
| Multiply instructions 32×32 = 64-bit result | ||||||||||||
| Divide instructions 32/32 = 32-bit quotient | ||||||||||||
| Saturated math instructions | Some | |||||||||||
| DSP instructions | Optional | Optional | ||||||||||
| Half-Precision floating-point instructions | Optional | Optional | Optional | |||||||||
| Single-Precision floating-point instructions | Optional | Optional | Optional | Optional | Optional | Optional | Optional | |||||
| Double-Precision floating-point instructions | Optional | Optional | Optional | Optional | ||||||||
| Helium vector instructions | Optional | Optional | Optional | |||||||||
| TrustZone security instructions | Optional | Optional | Optional | Optional | Optional | |||||||
| Co-processor instructions | Optional | Optional | Optional | Optional | Optional | |||||||
| ARM Custom Instructions | Optional | Optional | Optional | Optional | ||||||||
| Pointer Authentication and Branch Target Identification instructions | Optional | Optional |
- Note: Interrupt latency cycle count assumes: 1) stack located in zero-wait state RAM, 2) another interrupt function not currently executing, 3) Security Extension option doesn't exist, because it adds additional cycles. The Cortex-M cores with a Harvard computer architecture have a shorter interrupt latency than Cortex-M cores with a Von Neumann computer architecture.
- Note: The Cortex-M series includes three new 16-bit Thumb-1 instructions for sleep mode: SEV, WFE, WFI.
- Note: The Cortex-M0 / M0+ / M1 doesn't include these 16-bit Thumb-1 instructions: CBZ, CBNZ, IT.
- Note: The Cortex-M0 / M0+ / M1 only include these 32-bit Thumb-2 instructions: BL, DMB, DSB, ISB, MRS, MSR.
- Note: The Cortex-M0 / M0+ / M1 / M23 only has 32-bit multiply instructions with a lower-32-bit result, where as the Cortex-M3 / M4 / M7 / M33 / M35P includes additional 32-bit multiply instructions with 64-bit results. The Cortex-M4 / M7 include DSP instructions for,, multiplications.
- Note: The number of cycles to complete multiply and divide instructions vary across ARM Cortex-M core designs. Some cores have a silicon option for the choice of fast speed or small size, so cores have the option of using less silicon with the downside of higher cycle count. An interrupt occurring during the execution of a divide instruction or slow-iterative multiply instruction will cause the processor to abandon the instruction, then restart it after the interrupt returns.
- * Multiply instructions "32-bit result" Cortex-M0/M0+/M23 is 1 or 32 cycle silicon option, Cortex-M1 is 3 or 33 cycle silicon option, Cortex-M3/M4/M7/M33/M35P is 1 cycle.
- * Multiply instructions "64-bit result" Cortex-M3 is 3–5 cycles, Cortex-M4/M7/M33/M35P is 1 cycle.
- * Divide instructions Cortex-M3/M4 is 2–12 cycles, Cortex-M7 is 3–20 cycles, Cortex-M23 is 17 or 34 cycle option, Cortex-M33 is 2–11 cycles, Cortex-M35P is TBD.
- Note: Some Cortex-M cores have silicon options for various types of floating point units. The Cortex-M55 / M85 has an option for half-precision, the Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 has an option for single-precision, the Cortex-M7 / M52 / M55 / M85 has an option for double-precision. When an FPU is included, the core is sometimes referred as "Cortex-MxF", where 'x' is the core variant, such as Cortex-M4F.
| Group | Instr bits | Instructions | Cortex M0, M0+, M1 | Cortex M3 | Cortex M4 | Cortex M7 | Cortex M23 | Cortex M33 | Cortex M35P | Cortex M52 | Cortex M55 | Cortex M85 |
| 16 | ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH, LDRSB, LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, SEV, STM, STR, STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD | |||||||||||
| 16 | CBNZ, CBZ | |||||||||||
| 16 | IT | |||||||||||
| 32 | BL, DMB, DSB, ISB, MRS, MSR | |||||||||||
| 32 | SDIV, UDIV, MOVT, MOVW, B.W, LDREX, LDREXB, LDREXH, STREX, STREXB, STREXH | |||||||||||
| 32 | ADC, ADD, ADR, AND, ASR, B, BFC, BFI, BIC, CDP, CLREX, CLZ, CMN, CMP, DBG, EOR, LDC, LDM, LDR, LDRB, LDRBT, LDRD, LDRH, LDRHT, LDRSB, LDRSBT, LDRSH, LDRSHT, LDRT, LSL, LSR, MCR, MCRR, MLA, MLS, MRC, MRRC, MUL, MVN, NOP, ORN, ORR, PLD, PLDW, PLI, POP, PUSH, RBIT, REV, REV16, REVSH, ROR, RRX, RSB, SBC, SBFX, SEV, SMLAL, SMULL, SSAT, STC, STM, STR, STRB, STRBT, STRD, STRH, STRHT, STRT, SUB, SXTB, SXTH, TBB, TBH, TEQ, TST, UBFX, UMLAL, UMULL, USAT, UXTB, UXTH, WFE, WFI, YIELD | |||||||||||
| DSP | 32 | PKH, QADD, QADD16, QADD8, QASX, QDADD, QDSUB, QSAX, QSUB, QSUB16, QSUB8, SADD16, SADD8, SASX, SEL, SHADD16, SHADD8, SHASX, SHSAX, SHSUB16, SHSUB8, SMLABB, SMLABT, SMLATB, SMLATT, SMLAD, SMLALBB, SMLALBT, SMLALTB, SMLALTT, SMLALD, SMLAWB, SMLAWT, SMLSD, SMLSLD, SMMLA, SMMLS, SMMUL, SMUAD, SMULBB, SMULBT, SMULTT, SMULTB, SMULWT, SMULWB, SMUSD, SSAT16, SSAX, SSUB16, SSUB8, SXTAB, SXTAB16, SXTAH, SXTB16, UADD16, UADD8, UASX, UHADD16, UHADD8, UHASX, UHSAX, UHSUB16, UHSUB8, UMAAL, UQADD16, UQADD8, UQASX, UQSAX, UQSUB16, UQSUB8, USAD8, USADA8, USAT16, USAX, USUB16, USUB8, UXTAB, UXTAB16, UXTAH, UXTB16 | Optional | Optional | ||||||||
| SP Float | 32 | VABS, VADD, VCMP, VCMPE, VCVT, VCVTR, VDIV, VLDM, VLDR, VMLA, VMLS, VMOV, VMRS, VMSR, VMUL, VNEG, VNMLA, VNMLS, VNMUL, VPOP, VPUSH, VSQRT, VSTM, VSTR, VSUB | Optional | Optional | Optional | Optional | Optional | Optional | Optional | |||
| DP Float | 32 | VCVTA, VCVTM, VCVTN, VCVTP, VMAXNM, VMINNM, VRINTA, VRINTM, VRINTN, VRINTP, VRINTR, VRINTX, VRINTZ, VSEL | Optional | Optional | Optional | Optional | ||||||
| Acquire/Release | 32 | LDA, LDAB, LDAH, LDAEX, LDAEXB, LDAEXH, STL, STLB, STLH, STLEX, STLEXB, STLEXH | ||||||||||
| TrustZone | 16 | BLXNS, BXNS | rowspan="2" | rowspan="2" | rowspan="2" | rowspan="2" | Optional | Optional | Optional | Optional | Optional | rowspan="2" |
| TrustZone | 32 | SG, TT, TTT, TTA, TTAT | - | - | - | - | Optional | Optional | Optional | Optional | Optional | - |
| Co-processor | 16 | CDP, CDP2, MCR, MCR2, MCRR, MCRR2, MRC, MRC2, MRRC, MRRC2 | Optional | Optional | Optional | Optional | Optional | |||||
| ACI | 32 | CX1, CX1A, CX2, CX2A, CX3, CX3A, CX1D, CX1DA, CX2D, CX2DA, CX3D, CX3DA, VCX1, VCX1A, VCX2, VCX2A, VCX3, VCX3A | Optional | Optional | Optional | Optional | ||||||
| PACBTI | 32 | AUT, AUTG, BTI, BXAUT, PAC, PACBTI, PACG | Optional | Optional |
- Note: MOVW is an alias that means 32-bit "wide" MOV instruction.
- Note: B.W is a long-distance unconditional branch.
- Note: For Cortex-M1, WFE / WFI / SEV instructions exist, but execute as a NOP instruction.
- Note: The half-precision FPU instructions are valid in the Cortex-M52 / M55 / M85 only when the HP FPU option exists in the silicon.
- Note: The single-precision FPU instructions are valid in the Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 only when the SP FPU option exists in the silicon.
- Note: The double-precision FPU instructions are valid in the Cortex-M7 / M52 / M55 / M85 only when the DP FPU option exists in the silicon.