NVM Express


NVM Express or Non-Volatile Memory Host Controller Interface Specification is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via the PCI Express bus. The initial NVM stands for non-volatile memory, which is often NAND flash memory that comes in several physical form factors, including solid-state drives, PCIe add-in cards, and M.2 cards, the successor to mSATA cards. NVM Express, as a logical-device interface, has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.
Architecturally, the logic for NVMe is physically stored within and executed by the NVMe controller chip that is physically co-located with the storage media, usually an SSD. Version changes for NVMe, e.g., 1.3 to 1.4, are incorporated within the storage media, and do not affect PCIe-compatible components such as motherboards and CPUs.
By its design, NVM Express allows host hardware and software to fully exploit the levels of parallelism possible in modern SSDs. As a result, NVM Express reduces I/O overhead and brings various performance improvements relative to previous logical-device interfaces, including multiple long command queues, and reduced latency. The previous interface protocols like AHCI were developed for use with far slower hard disk drives where a very lengthy delay exists between a request and data transfer, where data speeds are much slower than RAM speeds, and where disk rotation and seek time give rise to further optimization requirements.
NVM Express devices are chiefly available in the miniature M.2 form factor, while standard-sized PCI Express expansion cards and 2.5-inch form-factor devices that provide a four-lane PCI Express interface through the U.2 connector are also available.

Specifications

Specifications for NVMe released to date include:
  • 1.0e
  • 1.1b that adds standardized Command Sets to achieve better compatibility across different NVMe devices, Management Interface that provides standardized tools for managing NVMe devices, simplifying administration and Transport Specifications that defines how NVMe commands are transported over various physical interfaces, enhancing interoperability.
  • 1.2
  • * 1.2a
  • * 1.2b
  • * 1.2.1 that introduces the following new features over version 1.1b: Multi-Queue to supports multiple I/O queues, enhancing data throughput and performance, Namespace Management that allows for dynamic creation, deletion, and resizing of namespaces, providing greater flexibility, and Endurance Management to monitor and manage SSD wear levels, optimizing performance and extending drive life.
  • 1.3
  • * 1.3a
  • * 1.3b
  • * 1.3c
  • * 1.3d that since version 1.2.1 added Namespace Sharing to allow multiple hosts accessing a single namespace, facilitating shared storage environments, Namespace Reservation to provides mechanisms for hosts to reserve namespaces, preventing conflicts and ensuring data integrity, and Namespace Priority that sets priority levels for different namespaces, optimizing performance for critical workloads.
  • 1.4
  • * 1.4a
  • *1.4b
  • *1.4c, that has the following new features compared to 1.3d: IO Determinism to ensure consistent latency and performance by isolating workloads, Namespace Write Protect for preventing data corruption or unauthorized modifications, Persistent Event Log that stores event logs in non-volatile memory, aiding in diagnostics and troubleshooting, and Verify Command that checks the integrity of data.
  • 2.0
  • * 2.0a
  • * 2.0b
  • * 2.0c
  • * 2.0d, that, compared to 1.4c, introduces Zoned Namespaces to organize data into zones for efficient write operations, reducing write amplification and improving SSD longevity, Key Value for efficient storage and retrieval of key-value pairs directly on the NVMe device, bypassing traditional file systems, Endurance Group Management to manages groups of SSDs based on their endurance, optimizing usage and extending lifespan.
  • * 2.0e
  • 2.1 that introduces Live Migration to maintaining service availability during migration, Key Per I/O for applying encryption keys at a per-operation level, NVMe-MI High Availability Out of Band Management for managing NVMe devices outside of regular data paths, and NVMe Network Boot / UEFI for booting NVMe devices over a network.
  • 2.2
  • 2.3

    Background

Historically, most SSDs used buses such as SATA, SAS, or Fibre Channel for interfacing with the rest of a computer system. Since SSDs became available in mass markets, SATA has become the most typical way for connecting SSDs in personal computers; however, SATA was designed primarily for interfacing with mechanical hard disk drives, and it became increasingly inadequate for SSDs, which improved in speed over time. For example, within about five years of mass market mainstream adoption many SSDs were already held back by the comparatively slow data rates available for hard drives—unlike hard disk drives, some SSDs are limited by the maximum throughput of SATA.
High-end SSDs had been made using the PCI Express bus before NVMe, but using non-standard specification interfaces, using a SAS to PCIe bridge or by emulating a hardware RAID controller. By standardizing the interface of SSDs, operating systems only need one common device driver to work with all SSDs adhering to the specification. It also means that each SSD manufacturer does not have to design specific interface drivers. This is similar to how USB mass storage devices are built to follow the USB mass-storage device class specification and work with all computers, with no per-device drivers needed.
NVM Express devices are also used as the building block of the burst buffer storage in many leading supercomputers, such as Fugaku Supercomputer, Summit Supercomputer and Sierra Supercomputer, etc.

History

The first details of a new standard for accessing non-volatile memory emerged at the Intel Developer Forum 2007, when NVMHCI was shown as the host-side protocol of a proposed architectural design that had Open NAND Flash Interface Working Group on the memory chips side. A NVMHCI working group led by Intel was formed that year. The NVMHCI 1.0 specification was completed in April 2008 and released on Intel's web site.
Technical work on NVMe began in the second half of 2009. The NVMe specifications were developed by the NVM Express Workgroup, which consists of more than 90 companies; Amber Huffman of Intel was the working group's chair. Version 1.0 of the specification was released on 1 March 2011, while version 1.1 of the specification was released on 11 October 2012. Major features added in version 1.1 are multi-path I/O and arbitrary-length scatter-gather I/O. It is expected that future revisions will significantly enhance namespace management. Because of its feature focus, NVMe 1.1 was initially called "Enterprise NVMHCI". An update for the base NVMe specification, called version 1.0e, was released in January 2013. In June 2011, a Promoter Group led by seven companies was formed.
The first commercially available NVMe chipsets were released by Integrated Device Technology in August 2012. The first NVMe drive, Samsung's XS1715 enterprise drive, was announced in July 2013; according to Samsung, this drive supported 3 GB/s read speeds, six times faster than their previous enterprise offerings. The LSI SandForce SF3700 controller family, released in November 2013, also supports NVMe. A Kingston HyperX "prosumer" product using this controller was showcased at the Consumer Electronics Show 2014 and promised similar performance. In June 2014, Intel announced their first NVM Express products, the Intel SSD data center family that interfaces with the host through PCI Express bus, which includes the DC P3700 series, the DC P3600 series, and the DC P3500 series., NVMe drives are commercially available.
In March 2014, the group incorporated to become NVM Express, Inc., which as of 2014 consists of more than 65 companies from across the industry. NVM Express specifications are owned and maintained by NVM Express, Inc., which also promotes industry awareness of NVM Express as an industry-wide standard. NVM Express, Inc. is directed by a thirteen-member board of directors selected from the Promoter Group, which includes Cisco, Dell, EMC, HGST, Intel, Micron, Microsoft, NetApp, Oracle, PMC, Samsung, SanDisk and Seagate.
In September 2016, the CompactFlash Association announced that it would be releasing a new memory card specification, CFexpress, which uses NVMe.
NVMe Host Memory Buffer feature added in version 1.2 of the NVMe specification. HMB allows SSDs to use the host's DRAM, which can improve the I/O performance for DRAM-less SSDs. For example, HMB can cache the FTL table by the SSD controller, and HMB can temporarily hold data while it is being written to the flash memory. NVMe 2.0 added optional Zoned Namespaces feature and Key-Value feature, and support for rotating media such as hard disk drives. ZNS and KV allows data to be mapped directly to its physical location in flash memory to directly access data on an SSD. ZNS and KV can also decrease write amplification of flash media.

Form factors

There are many form factors of NVMe solid-state drive, such as AIC, U.2, U.3, M.2 etc.

AIC (Add-in Card)

Almost all early NVMe solid-state drives are HHHL or FHHL PCI Express cards, with a PCIe 2.0 or 3.0 interface. A HHHL NVMe solid-state drive card is easy to insert into a PCIe slot of a server.

SATA Express, U.2 and U.3 (SFF-8639)

allows the use of two PCI Express 2.0 or 3.0 lanes and two SATA 3.0 ports through the same host-side SATA Express connector. SATA Express supports NVMe as the logical device interface for attached PCI Express storage devices. It is electrically compatible with MultiLink SAS, so a backplane can support both at the same time.
U.2, formerly known as SFF-8639, uses the same physical port as SATA Express but allows up to four PCI Express lanes. Available servers can combine up to 48 U.2 NVMe solid-state drives.
U.3 is built on the U.2 spec and uses the same SFF-8639 connector. Unlike in U.2, a single "tri-mode" backplane receptacle can handle all three types of connections; the controller automatically detects the type of connection used. This is unlike U.2, where users need to use separate controllers for SATA/SAS and NVMe. U.3 devices are required to be backwards-compatible with U.2 hosts, but U.2 drives are not compatible with U.3 hosts.