Intel released an explainer video about its upcoming XeSS AI upgrade technology and demonstrated how the technology works on its near-release Arc Alchemist GPUs. It uses the fastest Arc A770 for the demos, although it’s hard to tell how performance will compare to the best graphics cards based on the limited performance details shown.
If you’re at all familiar with Nvidia’s DLSS, which has been around for four years now in various incarnations, the video should evoke an acute sense of Deja Vu. Tom Petersen, who used to work for Nvidia and gave some of the old DLSS presentations, goes over the basics of XeSS. In short, XeSS sounds a lot like a mirrored version of Nvidia’s DLSS, except that it’s designed to work with Intel’s XMX deep learning cores rather than Nvidia’s tensor cores. The technology can also work with other GPUs, but using DP4a mode, which could make it an interesting alternative to AMD’s FSR 2.0 upscaler.
In the demos shown by Intel, XeSS seemed to work well. Of course, it’s hard to say for sure when the output video is a 1080p compressed version of the actual content, but we’ll save the detailed image quality comparisons for another time. The performance boost looks similar to what we’ve seen with DLSS, with over 100% frame rate increases in some situations when using XeSS Performance mode.
How it works
If you already know how DLSS works, Intel’s solution is largely the same, but with some minor changes. XeSS is an AI accelerated upscaling algorithm designed to increase frame rates in video games.
It starts with training, the first step in most deep learning algorithms. The AI network takes lower resolution sample frames from a game and processes them, generating what should be an upscaled output image. The network then compares the results to the desired target image and back propagates weight adjustments to try to correct any “errors”. The resulting images won’t look very good at first, but the AI algorithm slowly learns from its mistakes. After thousands (or more) of training images, the network eventually converges on the ideal weights that will “magically” generate the desired results.
Once the algorithm is fully trained using samples from many different games, it can in theory take any input image from any video game and scale it up almost perfectly. As with DLSS (and FSR 2.0), the XeSS algorithm also takes on the role of anti-aliasing and replaces classical solutions such as temporal AA.
Again, nothing particularly noteworthy so far. DLSS and FSR 2.0 and even the standard temporal AA algorithms have much of the same basic functionality – minus the AI stuff for FSR and TAA. Games will integrate XeSS into their rendering pipeline, usually after the main rendering and initial effects are done, but before the post-processing effects and GUI/HUD elements are drawn. In this way, the user interface remains clear while the difficult task of 3D rendering is performed at a lower resolution.
XeSS works with Intel’s Arc XMX cores, but can also work with other GPUs in a slightly different mode. The DP4a instructions are basically four INT8 (8-bit integer) calculations done using a single 32-bit register, which you normally access through a GPU shader core. Meanwhile, XMX cores natively support INT8 and can handle 128 values at once.
This may seem very one-sided, but as an example, the Arc A380 has 1024 shader cores, each of which can perform four INT8 operations simultaneously. Alternatively, the A380 has 128 MXM units, each of which can perform 128 INT8 operations. This makes MXM throughput four times faster than DP4a throughput, but obviously DP4a mode should still be sufficient for some level of XeSS goodness.
Note that DP4a appears to use a different trained network that is arguably less computationally intensive. How this will translate into real-world performance and image quality remains to be seen, and it looks like game developers will have to explicitly include support for both XMX and DP4a modes if they want to support GPUs other than from Arc.
Intel XeSS performance expectations
Intel showed several benchmarks for games running on XeSS, including a development build of Shadow of the Tomb Raider and a new 3DMark benchmark specifically created for XeSS. He also showed short clips of Arcadegeddon, Redout II, Ghostwire Tokyo, The DioField Chronicle, Chivalry II, Naraka Bladepoint, and Super People running with and without XeSS at the end of the video.
In Shadow of the Tomb Raider, running on an Arc A770 graphics card at 2560×1440 with near-max settings, including ray-traced shadows, XeSS provides from about a 25% performance boost on the Ultra Quality setting to more than 100% up to frame rates when using of the performance setting. The Quality and Balanced settings are in the middle and improve performance by about 50% and 75%, respectively.
These gains will naturally vary depending on the game engine, settings and base performance. The more demanding the game and the lower the frame rate, the more useful XeSS will be. Using Performance mode, Intel showed typical gains of 40% to 110% at 1440p, while Balanced mode provided improvements ranging from about 25% to 75%.
3DMark will also add the Intel XeSS Feature Test for its Advanced edition, which includes a benchmark mode, as well as a Frame Inspector that allows users to view benchmark images by zooming in to check for differences in visual quality. It seems much easier to use than Nvidia’s ICAT utility, although of course it’s also limited to providing frames from a single synthetic benchmark.
Since 3DMark uses its demanding Port Royal ray-tracing scene for the XeSS Feature Test, the performance improvements can be particularly impressive. At 1440p with XeSS in performance mode, the benchmark saw a 145% increase in FPS, a 109% increase with balanced mode, 81% using quality mode and 49% with ultra quality mode.
Frame Inspector also showed some good results, with XeSS reconstructing the image very well, to the point where Intel’s Tom Petersen claimed that the XeSS image actually looked better than native with TAA. Of course, you should take this with a grain of salt, and images from a single canned sequence will probably not fully represent the actual gaming experiences.
XeSS SDK and more than 20 games in development
Intel will provide an easy-to-use SDK for implementing XeSS in a game engine. The interface and requirements will be very similar to TAA implementations as well as DLSS and FSR 2.0, so it should be a relatively easy addition for any modern graphics engine.
Like TAA, FSR 2.0 and DLSS, XeSS needs motion vectors along with the current frame and keeps its own collection of previous frames. All of these are fed into the AI network to eventually generate a good result. XeSS also uses camera shake to help eliminate aliasing in the scene.
Intel currently has more than 20 XeSS games planned for release in the coming months. Some of them may fall through the cracks or slow down, but this is at least a decent start for the newcomer. Meanwhile, AMD just announced eight more games that have recently added or will soon add FSR 2.0, and Nvidia has over 100 games shipping with DLSS 2.0 or higher. How many game developers will be willing to add all three alternatives, giving gamers a choice of the best algorithm? We suspect that many games will only support one or two of the possible scaling options.
XeSS will officially launch when Intel releases its Arc Alchemist GPUs globally at some point in the likely near future. The Arc A380 has effectively launched at this point, and Intel is already teasing the A750 and A770. We hope to experience XeSS, both in MXM and DP4a modes, in the not too distant future. At the moment, uptake remains far behind the competition from AMD and Nvidia.