Graphics processing unit

A graphics processing unit is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a component on a discrete graphics card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. GPUs are increasingly being used for artificial intelligence processing due to linear algebra acceleration which is also used extensively in graphics processing.
Although there is no single definition of the term, and it may be used to describe any video display system, in modern use a GPU includes the ability to internally perform the calculations needed for various graphics tasks, like rotating and scaling 3D images, and often the additional ability to run custom programs known as shaders. This contrasts with earlier graphics controllers known as video display controllers which had no internal calculation capabilities, or blitters, which performed only basic memory movement operations. The modern GPU emerged during the 1990s, adding the ability to perform operations like drawing lines and text without CPU help, and later adding 3D functionality.
Graphics functions are generally independent and this lends these tasks to being implemented on separate calculation engines. Modern GPUs include hundreds, or thousands, of calculation units. This made them useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. The ability of GPUs to rapidly perform vast numbers of calculations has led to their adoption in diverse fields including artificial intelligence where they excel at handling data-intensive and computationally demanding tasks. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

Fandom microsoft:Graphics_proces...

History

1970s

Arcade system boards have used specialized graphics circuits since the 1970s. In early video game hardware, RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor.
A specialized barrel shifter circuit helped the CPU animate the framebuffer graphics for various 1970s arcade video games from Midway and Taito, such as Gun Fight, Sea Wolf, and Space Invaders. The Namco Galaxian arcade system in 1979 used specialized graphics hardware that supported RGB color, multi-colored sprites, and tilemap backgrounds. The Galaxian hardware was widely used during the golden age of arcade video games, by game companies such as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu, Sega, and Taito.
The Atari 2600 in 1977 used a video shifter called the Television Interface Adaptor. Atari 8-bit computers had ANTIC, a video processor which interpreted instructions describing a "display list"—the way the scan lines map to specific bitmapped or character modes and where the memory is stored. 6502 machine code subroutines could be triggered on scan lines by setting a bit on a display list instruction. ANTIC also supported smooth vertical and horizontal scrolling independent of the CPU.

1980s

The NEC μPD7220 was the first implementation of a personal computer graphics display processor as a single large-scale integration integrated circuit chip. This enabled the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became the best-known GPU until the mid-1980s. It was the first fully integrated VLSI metal–oxide–semiconductor graphics display processor for PCs, supported up to 1024×1024 resolution, and laid the foundations for the PC graphics market. It was used in a number of graphics cards and was licensed for clones such as the Intel 82720, the first of Intel's graphics processing units. The Williams Electronics arcade games Robotron: 2084, Joust, Sinistar, and Bubbles, all released in 1982, contain custom blitter chips for operating on 16-color bitmaps.
In 1984, Hitachi released the ARTC HD63484, the first major CMOS graphics processor for personal computers. The ARTC could display up to 4K resolution when in monochrome mode. It was used in a number of graphics cards and terminals during the late 1980s. In 1985, the Amiga was released with a custom graphics chip including a blitter for bitmap manipulation, line drawing, and area fill. It also included a coprocessor with its own simple instruction set, that was capable of manipulating graphics hardware registers in sync with the video beam, or driving the blitter. In 1986, Texas Instruments released the TMS34010, the first fully programmable graphics processor. It could run general-purpose code but also had a graphics-oriented instruction set. During 1990–1992, this chip became the basis of the Texas Instruments Graphics Architecture Windows accelerator cards.
In 1987, the IBM 8514 graphics system was released. It was one of the first video cards for IBM PC compatibles that implemented fixed-function 2D primitives in electronic hardware. Sharp's X68000, released in 1987, used a custom graphics chipset with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields. It served as a development machine for Capcom's CP System arcade board. Fujitsu's FM Towns computer, released in 1989, had support for a 16,777,216 color palette. In 1988, the first dedicated polygonal 3D graphics boards were introduced in arcades with the Namco System 21 and Taito Air System.
IBM introduced its proprietary Video Graphics Array display standard in 1987, with a maximum resolution of 640×480 pixels. In November 1988, NEC Home Electronics announced its creation of the Video Electronics Standards Association to develop and promote a Super VGA computer display standard as a successor to VGA. Super VGA enabled graphics display resolutions up to 800×600 pixels, a 56% increase.

1990s

In 1991, S3 Graphics introduced the S3 86C911, which its designers named after the Porsche 911 as an indication of the performance increase it promised. The 86C911 spawned a variety of imitators: by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips. Fixed-function Windows accelerators surpassed expensive general-purpose graphics coprocessors in Windows performance, and such coprocessors faded from the PC market.
In the early- and mid-1990s, real-time 3D graphics became increasingly common in arcade, computer, and console games, which led to increasing public demand for hardware-accelerated 3D graphics. Early examples of mass-market 3D graphics hardware can be found in arcade system boards such as the Sega Model 1, Namco System 22, and Sega Model 2, and the fifth-generation video game consoles such as the Saturn, PlayStation, and Nintendo 64. Arcade systems such as the Sega Model 2 and SGI Onyx-based Namco Magic Edge Hornet Simulator in 1993 were capable of hardware T&L years before appearing in consumer graphics cards. Another early example is the Super FX chip, a RISC-based on-cartridge graphics chip used in some SNES games, notably Doom and Star Fox. Some systems used DSPs to accelerate transformations. Fujitsu, which worked on the Sega Model 2 arcade system, began working on integrating T&L into a single LSI solution for use in home computers in 1995; the Fujitsu Pinolite, the first 3D geometry processor for personal computers, announced in 1997. The first hardware T&L GPU on home video game consoles was the Nintendo 64's Reality Coprocessor, released in 1996. In 1997, Mitsubishi released the 3Dpro/2MP, a GPU capable of transformation and lighting, for workstations and Windows NT desktops; ATi used it for its FireGL 4000 graphics card, released in 1997.
The term "GPU" was coined by Sony in reference to the 32-bit Sony GPU in the PlayStation video game console, released in 1994.

2000s

In October 2002, with the introduction of the ATI Radeon 9700, the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point math, and were quickly becoming as flexible as CPUs, yet orders of magnitude faster for image-array operations. Pixel shading is often used for bump mapping, which adds texture to make an object look shiny, dull, rough, or even round or extruded.
With the introduction of the Nvidia GeForce 8 series and new generic stream processing units, GPUs became more generalized computing devices. Parallel GPUs are making computational inroads against the CPU, and a subfield of research, dubbed GPU computing or GPGPU for general purpose computing on GPU, has found applications in fields as diverse as machine learning, oil exploration, scientific image processing, linear algebra, statistics, 3D reconstruction, and stock options pricing. GPGPUs were the precursors to what is now called a compute shader and actually abused the hardware to a degree by treating the data passed to algorithms as texture maps and executing algorithms by drawing a triangle or quad with an appropriate pixel shader. This entails some overheads since units like the scan converter are involved where they are not needed.
Nvidia's CUDA platform, first introduced in 2007, was the earliest widely adopted programming model for GPU computing. OpenCL is an open standard defined by the Khronos Group that allows for the development of code for both GPUs and CPUs with an emphasis on portability. OpenCL solutions are supported by Intel, AMD, Nvidia, and ARM, and according to a report in 2011 by Evans Data, OpenCL had become the second most popular HPC tool.

2010s

In 2010, Nvidia partnered with Audi to power their cars' dashboards, using the Tegra GPU to provide increased functionality to cars' navigation and entertainment systems. Advances in GPU technology in cars helped advance self-driving technology. AMD's Radeon HD 6000 series cards were released in 2010, and in 2011 AMD released its 6000M Series discrete GPUs for mobile devices. The Kepler line of graphics cards by Nvidia were released in 2012 and were used in the Nvidia 600 and 700 series cards. A feature in this GPU microarchitecture included GPU boost, a technology that adjusts the clock-speed of a video card to increase or decrease according to its power draw. Kepler also introduced NVENC video encoding acceleration technology.
The PS4 and Xbox One were released in 2013; they both used GPUs based on AMD's Radeon HD 7850 and 7790. Nvidia's Kepler line of GPUs was followed by the Maxwell line, manufactured on the same process. Nvidia's 28 nm chips were manufactured by TSMC in Taiwan using the 28 nm process. Compared to the 40 nm technology from the past, this manufacturing process allowed a 20 percent boost in performance while drawing less power. Virtual reality headsets have high system requirements; manufacturers recommended the GTX 970 and the R9 290X or better at the time of their release. Cards based on the Pascal microarchitecture were released in 2016. The GeForce 10 series of cards are of this generation of graphics cards. They are made using the 16 nm manufacturing process which improves upon previous microarchitectures.
In 2018, Nvidia launched the RTX 20 series GPUs that added ray tracing cores to GPUs, improving their performance on lighting effects. Polaris 11 and Polaris 10 GPUs from AMD are fabricated by a 14 nm process. Their release resulted in a substantial increase in the performance per watt of AMD video cards. AMD also released the Vega GPU series for the high end market as a competitor to Nvidia's high end Pascal cards, also featuring HBM2 like the Titan V.
In 2019, AMD released the successor to their Graphics Core Next microarchitecture/instruction set. Dubbed RDNA, the first product featuring it was the Radeon RX 5000 series of video cards. The company announced that the successor to the RDNA microarchitecture would be incremental. AMD unveiled the Radeon RX 6000 series, its RDNA 2 graphics cards with support for hardware-accelerated ray tracing. The product series, launched in late 2020, consisted of the RX 6800, RX 6800 XT, and RX 6900 XT. The RX 6700 XT, which is based on Navi 22, was launched in early 2021.
The PlayStation 5 and Xbox Series X and Series S were released in 2020; they both use GPUs based on the RDNA 2 microarchitecture with incremental improvements and different GPU configurations in each system's implementation.

2020s

In the 2020s, GPUs have been increasingly used for calculations involving embarrassingly parallel problems, such as training of neural networks on enormous datasets that are needed for artificial intelligence large language models. Specialized processing cores on some modern workstation's GPUs are dedicated for deep learning since they have significant FLOPS performance increases, using 4×4 matrix multiplication and division, resulting in hardware performance up to 128 TFLOPS in some applications. These tensor cores are expected to appear in consumer cards, as well.

GPU companies

Many companies have produced GPUs under a number of brand names. In 2009, Intel, Nvidia, and AMD/ATI were the market share leaders, with 49.4%, 27.8%, and 20.6% market share respectively. In addition, Matrox produces GPUs. Chinese companies such as Jingjia Micro have also produced GPUs for the domestic market although in terms of worldwide sales, they lag behind market leaders.

Computational functions

Several factors of GPU construction affect the performance of the card for real-time rendering, such as the size of the connector pathways in the semiconductor device fabrication, the clock signal frequency, and the number and size of various on-chip memory caches. Performance is also affected by the number of streaming multiprocessors for NVidia GPUs, or compute units for AMD GPUs, or Xe cores for Intel Xe based GPUs, which describe the number of on-silicon processor core units within the GPU chip that perform the core calculations, typically working in parallel with other SM/CUs on the GPU. GPU performance is typically measured in floating point operations per second ; GPUs in the 2010s and 2020s typically deliver performance measured in teraflops. This is an estimated performance measure, as other factors can affect the actual display rate.

2D graphics APIs

An earlier GPU may support one or more 2D graphics APIs for 2D acceleration, such as GDI and DirectDraw.

GPU forms

Terminology

In the 1970s, the term "GPU" originally stood for graphics processor unit and described a programmable processing unit working independently from the CPU that was responsible for graphics manipulation and output. In 1994, Sony used the term in reference to the PlayStation console's Toshiba-designed Sony GPU. The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as "the world's first GPU". It was presented as a "single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines". Rival ATI Technologies coined the term "visual processing unit" with the release of the Radeon 9700 in 2002. The AMD Alveo MA35D features dual VPUs, each using the 5 nm process in 2023.
In personal computers, there are two main forms of GPUs: dedicated graphics and integrated graphics, or unified memory architecture ).

Dedicated graphics processing unit

Dedicated graphics processing units use RAM that is dedicated to the GPU rather than relying on the computer's main system memory. This RAM is usually specially selected for the expected serial workload of the graphics card, such as GDDR SDRAM. Sometimes systems with dedicated discrete GPUs were called "DIS" systems as opposed to "UMA" systems.
Technologies such as Scan-Line Interleave by 3dfx, Scalable Link Interface and NVLink by Nvidia and CrossFire by AMD allow multiple GPUs to draw images simultaneously for a single screen, increasing the processing power available for graphics. These technologies, however, are increasingly uncommon; most games do not fully use multiple GPUs, as most users cannot afford them. Multiple GPUs are still used on supercomputers ; on workstations to accelerate video and 3D rendering; for visual effects ; general purpose graphics processing unit workloads and for simulations, and in AI to expedite training, as is the case with Nvidia's lineup of DGX workstations and servers, Tesla GPUs, and Intel's Ponte Vecchio GPUs.

Integrated graphics processing unit

Integrated graphics processing units, also called integrated graphics, shared graphics solutions, integrated graphics processors, or unified memory architectures use a portion of a computer's system RAM rather than dedicated graphics memory. IGPs can be integrated onto a motherboard as part of its northbridge chipset, or on the same die (integrated circuit) with the CPU, such as Accelerated Processing Unit or Intel HD Graphics. On certain motherboards, AMD's IGPs can use dedicated sideport memory: a separate fixed block of high performance memory that is dedicated for use by the GPU., computers with integrated graphics account for about 90% of all PC shipments. They are less costly to implement than dedicated graphics processing, but tend to be less capable. Historically, integrated processing was considered unfit for 3D games or graphically intensive programs but could run less intensive programs such as Adobe Flash. Examples of such IGPs would be offerings from SiS and VIA circa 2004. However, modern integrated graphics processors such as AMD Accelerated Processing Unit and Intel Graphics Technology, such as HD, UHD, Iris, Iris Pro, Iris Plus, and Xe-LP, can handle 2D graphics or low-stress 3D graphics.
Because GPU computations are memory-intensive, integrated processing may compete with the CPU for relatively slow system RAM, as it has minimal or no dedicated video memory. IGPs use system memory with bandwidth up to a current maximum of 128 gigabytes per second, whereas a discrete graphics card may have a bandwidth of more than 1000 gigabytes per second between its video random access memory and GPU core. This memory bus bandwidth can limit the performance of the GPU, though multi-channel memory can mitigate this deficiency. Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.
On systems with "Unified Memory Architecture", including modern AMD processors with integrated graphics, modern Intel processors with integrated graphics, Apple processors, the PS5 and Xbox Series, the CPU cores and the GPU block share the same pool of RAM and memory address space.

Stream processing and general purpose GPUs (GPGPU)

It is common to use a general purpose graphics processing unit as a modified form of stream processor or a vector processor, running compute kernels. This turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete GPU designers, AMD and Nvidia, are pursuing this approach with an array of applications. Nvidia and AMD collaborated with Stanford University to create a GPU-based client for the Folding@home distributed computing project for protein folding calculations. In certain circumstances, the GPU calculates forty times faster than the CPUs traditionally used by such applications.
GPU-based high performance computers play a significant role in large-scale modelling. Three of the ten most powerful supercomputers in the world take advantage of GPU acceleration.
Since 2005 there has been interest in using the performance offered by GPUs for evolutionary computation in general, and for accelerating the fitness evaluation in genetic programming in particular. Most approaches compile linear or tree programs on the host PC and transfer the executable to the GPU to be run. Typically a performance advantage is only obtained by running the single active program simultaneously on many example problems in parallel, using the GPU's single instruction, multiple data architecture. Substantial acceleration can also be obtained by not compiling the programs, and instead transferring them to the GPU, to be interpreted there.

External GPU (eGPU)

A GPU can be attached to some external bus of a notebook. PCI Express is the only bus used for this purpose. The port may be, for example, an ExpressCard or mPCIe port, a Thunderbolt 1, 2, or 3 port, a USB4 port with Thunderbolt compatibility, or an OCuLink port. Those ports are only available on certain notebook systems. eGPU enclosures include their own power supply, because powerful GPUs can consume hundreds of watts.

Sales

In 2013, 438.3 million GPUs were shipped globally and the forecast for 2014 was 414.2 million. However, by the third quarter of 2022, shipments of PC GPUs totaled around 75.5 million units, down 19% year-over-year.

Hardware

List of AMD graphics processing units
List of Nvidia graphics processing units
List of Intel graphics processing units
List of discrete and integrated graphics processing units
Intel GMA
Larrabee
Nvidia PureVideo – the bit-stream technology from Nvidia used in their graphics chips to accelerate video decoding on hardware GPU with DXVA.
SoC
UVD (Unified Video Decoder) – the video decoding bit-stream technology from ATI to support hardware decode with DXVA

APIs

OpenGL API
OpenCL API
OpenVX API
TensorFlow Lite
Mantle (API)
Metal (API)
* Core ML
Vulkan (API)
Direct3D
* DirectX Video Acceleration (DxVA) API for Microsoft Windows operating-system.
* DirectML
Direct2D
* DirectDraw
* DirectWrite
Video Acceleration API (VA API)
VDPAU (Video Decode and Presentation API for Unix)
X-Video Bitstream Acceleration (XvBA), the X11 equivalent of DXVA for MPEG-2, H.264, and VC-1
X-Video Motion Compensation – the X11 equivalent for MPEG-2 video codec only

Applications

GPU cluster
Mathematica – includes built-in support for CUDA and OpenCL GPU execution
Molecular modeling on GPU
Deeplearning4j – open-source, distributed deep learning for Java