Deep learning super sampling


Deep learning super sampling is a technology developed by Nvidia, using deep learning to produce an image that looks like a higher-resolution image of the original lower-resolution image. This technology is advertised as allowing a much higher resolution of the original without as much work from the video card.
As of April 2020, this technology is only available on GeForce RTX 20 series GPUs.

History

Nvidia advertised DLSS as a key feature of the GeForce RTX 20 series GPUs when they launched in September 2018. At that time, the results were limited to a few video games because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling.
In 2019, the videogame Control shipped with ray tracing and an improved version of DLSS, but which didn't use deep learning.
In April 2020, Nvidia advertised and shipped with driver version 445.75 an improved version of DLSS named DLSS 2.0, which was available for a few existing games including Control and , and would be available later for upcoming games. This time Nvidia said that it used machine learning again, but did not need to be trained specifically on each game.
A side effect of DLSS 2.0 is that it seems not to work very well with anti-aliasing techniques such as MSAA or TSAA, the performance being very negatively impacted if these techniques are enabled on top of DLSS.
As of April 2020, DLSS 2.0 must still be included on a per-game basis by the game developers.

Release history

Algorithm

DLSS 1.0

Nvidia explained that DLSS 1.0 worked for each target game image by generating a "perfect frame" using traditional supersampling, then trained the Neural network on these resulting images. On a second step, the model was trained to recognize aliased inputs on the initial result.

DLSS 2.0

DLSS 2.0 works as follows:
DLSS is only available on GeForce RTX 20 series GPUs, in dedicated AI accelerators called Tensor Cores.
Tensor Cores are available since the Nvidia Volta GPU microarchitecture, which was first used on the Tesla V100 line of products. Their specificity is that each Tensor Core operate on 16 bits floating point 4 x 4 matrices, and seem to be designed to be used at the CUDA C++ level, even at the compiler level.
The Tensor Cores use CUDA Warp-Level Primitives on 32 parallel threads to take advantage of their parallel architecture. A Warp is a set of 32 threads which are configured to execute the same instruction.