Deep learningsuper sampling is a technology developed by Nvidia, using deep learning to produce an image that looks like a higher-resolution image of the original lower-resolution image. This technology is advertised as allowing a much higher resolution of the original without as much work from the video card. As of April 2020, this technology is only available on GeForce RTX 20 series GPUs.
History
Nvidia advertised DLSS as a key feature of the GeForce RTX 20 series GPUs when they launched in September 2018. At that time, the results were limited to a few video games because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling. In 2019, the videogame Control shipped with ray tracing and an improved version of DLSS, but which didn't use deep learning. In April 2020, Nvidia advertised and shipped with driver version 445.75 an improved version of DLSS named DLSS 2.0, which was available for a few existing games including Control and , and would be available later for upcoming games. This time Nvidia said that it used machine learning again, but did not need to be trained specifically on each game. A side effect of DLSS 2.0 is that it seems not to work very well with anti-aliasing techniques such as MSAA or TSAA, the performance being very negatively impacted if these techniques are enabled on top of DLSS. As of April 2020, DLSS 2.0 must still be included on a per-game basis by the game developers.
Release history
Algorithm
DLSS 1.0
Nvidia explained that DLSS 1.0 worked for each target game image by generating a "perfect frame" using traditional supersampling, then trained the Neural network on these resulting images. On a second step, the model was trained to recognize aliased inputs on the initial result.
DLSS 2.0
DLSS 2.0 works as follows:
The neural network is trained by Nvidia using "ideal" images of video games of ultra-high resolution on supercomputers and low resolution images of the same games. The result is stored on the video card driver. It is said the Nvidia uses DGX-1 servers to perform the training of the network.
The Neural Network stored on the driver compares the actual low resolution image with the reference and produce a full high resolution result. The inputs used by the trained Neural Network are the low resolution aliased images rendered by the game engine, and the low resolution, motion vectors from the same images, also generated by the game engine. The motion vectors tell the network which direction objects in the scene are moving from frame to frame, in order to estimate what the next frame will look like.
Architecture
DLSS is only available on GeForce RTX 20 series GPUs, in dedicated AI accelerators called Tensor Cores. Tensor Cores are available since the Nvidia VoltaGPUmicroarchitecture, which was first used on the Tesla V100 line of products. Their specificity is that each Tensor Core operate on 16 bits floating point 4 x 4 matrices, and seem to be designed to be used at the CUDA C++ level, even at the compiler level. The Tensor Cores use CUDA Warp-Level Primitives on 32 parallel threads to take advantage of their parallel architecture. A Warp is a set of 32 threads which are configured to execute the same instruction.