AMD 10h


The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. The first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007, as the immediate successors to the K8 series of processors.

Nomenclature

It appears that AMD has not used K-nomenclature from the time after the use of the codename K8 for the AMD K8 or Athlon 64 processor family, since no K-nomenclature naming convention beyond K8 has appeared in official AMD documents and press releases after the beginning of 2005.
The name "K8L" was first coined by Charlie Demerjian in 2005, at the time a writer at The Inquirer, and was used by the wider IT community as a convenient shorthand while according to AMD official documents, the processor family was termed "AMD Next Generation Processor Technology".
The microarchitecture has also been referred to as Stars, as the codenames for desktop line of processors was named under stars or constellations.
In a video interview, Giuseppe Amato confirmed that the codename is K10.
It was revealed, by The Inquirer itself, that the codename "K8L" referred to a low-power version of the K8 family, later named Turion 64, and that K10 was the official codename for the microarchitecture.
AMD refers to it as Family 10h Processors, as it is the successor of the Family 0Fh Processors. 10h and 0Fh refer to the main result of the CPUID x86 processor instruction. In hexadecimal numbering, 0Fh equals the decimal number 15, and 10h equals decimal 16.

Schedule of launch and delivery

Timeline

Historical information

In 2003, AMD outlined the features for upcoming generations of microprocessors after the K8 family of processors in various events and analyst meetings, including the Microprocessor Forum 2003. The outlined features to be deployed by the next-generation microprocessors are as follows:
  • Threaded architectures.
  • Chip level multiprocessing.
  • Huge scale MP machines.
  • 10 GHz operation.
  • Much higher performance superscalar, out-of-order CPU core.
  • Huge caches.
  • Media/vector processing extensions.
  • Branch and memory hints.
  • Security and virtualization.
  • Enhanced Branch Predictors.
  • Static and dynamic power management.
In June 2006, AMD executive vice president Henri Richard had an interview with DigiTimes commented on the upcoming processor developments:

Live demonstrations

On November 30, 2006, AMD live demonstrated the native quad core chip known as "Barcelona" for the first time in public, while running Windows Server 2003 64-bit Edition. AMD claims 70% scaling of performance in real world loads, and better performance than Intel Xeon 5355 processor codenamed Clovertown.
On January 24, 2007, AMD Executive Vice President Randy Allen claimed that in live tests, in regard to a wide variety of workloads, "Barcelona" was able to demonstrate 40% performance advantage over the comparable Intel Xeon codenamed Clovertown dual-processor quad-core processors. The expected performance of floating point per core would be approximately 1.8 times that of the K8 family, at the same clock speed.
On May 10, 2007, AMD held a private event demonstrating the upcoming processors codenamed Agena FX and chipsets, with one demonstrated system being AMD Quad FX platform with one Radeon HD 2900 XT graphics card on the upcoming RD790 chipset. The system was also demonstrated real-time converting a 720p video clip into another undisclosed format while all 8 cores were maxed at 100% by other tasks.

Sister microarchitecture

On the December 2006 analyst day, Executive vice president Marty Seyer announced a new mobile core codenamed Griffin launched in 2008 with inherited power optimizations technologies from the K10 microarchitecture, but based on a K8 design.

TLB bug

In November 2007, AMD stopped delivery of Barcelona processors after a bug in the translation lookaside buffer of stepping B2 was discovered that could rarely lead to a race condition and thus a system lockup. A patch in BIOS or software worked around the bug by disabling cache for page tables, but it was connected to a 5 to 20% performance penalty. Kernel patches that would almost completely avoid this penalty were published for Linux. In April 2008, the new stepping B3 was brought to the market by AMD, including a fix for the bug plus other minor enhancements.

Features

Fabrication technology

AMD has introduced the microprocessors manufactured at 65 nm feature width using Silicon-on-insulator technology, since the release of K10 coincides with the volume ramp of this manufacturing process.

Supported DRAM standards

The K8 family was known to be particularly sensitive to memory latency since its design gains performance by minimizing this through the use of an on-die memory controller ; increased latency in the external modules negates the usefulness of the feature. DDR2 RAM introduces some additional latency over DDR RAM since the DRAM is internally driven by a clock at one quarter of the external data frequency, as opposed to one half that of DDR. However, since the command clock rate in DDR2 is doubled relative to DDR and other latency-reducing features have been introduced, common comparisons based on CAS latency alone are not sufficient. For example, Socket AM2 processors are known to demonstrate similar performance using DDR2 SDRAM as Socket 939 processors that utilize DDR-400 SDRAM. K10 processors support DDR2 SDRAM rated up to DDR2-1066.
While some desktop K10 processors are AM2+ supporting only DDR2, an AM3 K10 processor supports both DDR2 and DDR3. A few AM3 motherboards have both DDR2 and DDR3 slots, but for the most part they have only DDR3.
Lynx desktop processors only support DDR3, as they use the FM1 socket.

Microarchitecture characteristics

Characteristics of the microarchitecture include the following:
  • Form factors
  • * Socket AM2+ with DDR2 for the 65 nm Phenom and Athlon 7000 Series
  • * Socket AM3 with either DDR2 or DDR3 for Semprons and the 45 nm Phenom II and Athlon II Series. They can also be used on AM3+ motherboards with DDR3. Note that, while all K10 Phenom Processors are backwards compatible with Socket AM2+ and Socket AM2, some 45 nm Phenom II Processors are only available for Socket AM2+. Lynx processors do not use either AM2+ nor AM3.
  • * Socket FM1 with DDR3 for Lynx processors.
  • * Socket F with DDR2, DDR3 with Shanghai and later Opteron processors
  • Instruction set additions and extensions
  • * New bit-manipulation instructions ABM: Leading Zero Count and Population Count
  • * New SSE instructions named as SSE4a: combined mask-shift instructions and scalar streaming store instructions. These instructions are not found in Intel's SSE4
  • * Support for unaligned SSE load-operation instructions
  • Execution pipeline enhancements
  • * 128-bit wide SSE units
  • * Wider L1 data cache interface allowing for two 128-bit loads per cycle
  • * Lower integer divide latency
  • * 512-entry indirect branch predictor and a larger return stack and branch target buffer
  • * Side-Band Stack Optimizer, dedicated to perform increment/decrement of register stack pointer
  • * Fastpathed CALL and RET-Imm instructions as well as MOVs from SIMD registers to general purpose registers
  • Integration of new technologies onto CPU die:
  • * Four processor cores
  • * Split power planes for CPU core and memory controller/northbridge for more effective power management, first dubbed Dynamic Independent Core Engagement or D. I. C. E. by AMD and now known as Enhanced PowerNow!, allowing the cores and northbridge to scale power consumption up or down independently.
  • * Shutting down portions of the circuits in core when not in load, named "CoolCore" Technology.
  • Improvements in the memory subsystem:
  • * Improvements in access latency:
  • ** Support for re-ordering loads ahead of other loads and stores
  • ** More aggressive instruction prefetching, 32 bytes instruction prefetch as opposed to 16 bytes in K8
  • ** DRAM prefetcher for buffering reads
  • ** Buffered burst writeback to RAM in order to reduce contention
  • * Changes in memory hierarchy:
  • ** Prefetch directly into L1 cache as opposed to L2 cache with K8 family
  • ** 32-way set associative L3 victim cache sized at least 2 MB, shared between processing cores on a single die, with a sharing-aware replacement policy.
  • ** Extensible L3 cache design, with 6 MB planned for 45 nm process node, with the chips codenamed Shanghai.
  • * Changes in address space management:
  • ** Two 64-bit independent memory controllers, each with its own physical address space; this provides an opportunity to better utilize the available bandwidth in case of random memory accesses occurring in heavily multi-threaded environments. This approach is in contrast to the previous "interleaved" design, where the two 64-bit data channels were bounded to a single common address space.
  • ** Larger Tagged Lookaside Buffers; support for 1 GB page entries and a new 128-entry 2 MB page TLB
  • ** 48-bit memory addressing to allow for 256 TB memory subsystems
  • ** Memory mirroring, data poisoning support and Enhanced RAS
  • ** AMD-V Nested Paging for improved MMU virtualization, claimed to have decreasing world switch time by 25%.
  • Improvements in system interconnect:
  • * HyperTransport retry support
  • * Support for HyperTransport 3.0, with HyperTransport Link unganging which creates 8 point-to-point links per socket.
  • Platform-level enhancements with additional functionality:
  • * Five p-states allowing for automatic clock rate modulation
  • * Increased clock gating
  • * Official support for coprocessors via HTX slots and vacant CPU sockets through HyperTransport: Torrenza initiative.