5/17/2016

Nvidia’s GTX 1080 redefines high-end gaming performance



Nvidia formally announced the GTX 1080 in Austin just over a week ago, but it held back on the GPU deep dive at the initial presentation. While the GTX 1080 isn’t scheduled to launch until May 27, Nvidia has lifted the curtain on the GPUs performance and technical advances. We’ve covered some of the card’s improvements in VR and overall positioning, plus new technologies like Ansel that give gamers more artistic freedom, so we’ll be focusing on areas where Nvidia hadn’t disclosed as much information. Unfortunately, Nvidia was unable to supply us with a GTX 1080 in time for launch, so we’ll have to defer benchmarks and performance comparisons for another day.


Pascal is an evolutionary step forward from Maxwell and many of the technologies that debuted in the GeForce 9xx family have been refined, improved, and enhanced for the GTX 1080. The base GPU packs 2560 cores with 20 SM blocks and 128 cores per block. There are 160 texture units and 64 ROPS, which is interesting — one common theory was that AMD and Nvidia would both ship substantially more ROPs this year to ensure that they didn’t become fill-rate limited in VR or at higher resolutions. Nvidia chose to deal with this another way, which we’ll explore shortly.



The GTX 1080 packs 25% more cores and 25% more texture units than the GTX 980 it replaces, along with a much higher base clock (1.61GHz vs. 1.1GHz for Maxwell) and significantly faster RAM (320GB/s of memory bandwidth, compared to 224GB/s for Maxwell). On paper, the 1080 looks much more like the 980 Ti. In practice, it often outperforms that card.

Memory bandwidth, memory compression
Pascal uses GDDR5X to increase its memory bandwidth, but Nvidia chose to stick with a 256-bit memory bus rather than the 384-bit bus that the GTX 980 Ti and GTX Titan X use. The secret sauce in Nvidia’s recipe? A more advanced version of the same delta color compression techniques that Maxwell used.



When Nvidia launched Maxwell, it claimed that Maxwell reduced its memory bandwidth needs by 19-30% over Kepler, depending on the game in question. Pascal delivers a further 11-28% improvement over Maxwell, again depending on the game in question. Considering that the GTX 1080 already offers 43% more bandwidth than the GTX 980, the added color compression is icing on the cake — or a smart way to ensure the card can scale to 4K or even beyond, depending on your point of view.



The outrageously pink image above shows the difference between Maxwell and Pascal’s color compression. You can see that Maxwell already does a pretty good job of compressing the frame, but there are still a significant number of areas where Maxwell couldn’t compress the data.



Here’s Pascal’s version of the same frame. If you’re thinking “Wow, that’s really pink,” you’re on the right track. What this means is that Pascal can extract memory compression savings from a much larger percentage of the frame than Maxwell could. In the long run there’s an inevitable diminishing marginal return to saving bandwidth via compression, but the feature worked extremely well in Maxwell and should give Pascal an additional edge in 4K gaming.

Display support, media encode/decode, and SLI bridges
Nvidia has announced that the GTX 1080 will support a maximum resolution of 7680×4320 at 60Hz if two DisplayPort 1.3 connectors are used to drive the display. The GPU is only certified for DP 1.2 but is listed as DP 1.3 and 1.4 “ready.” HDMI 2.0b and HDCP 2.2 are both supported as well. Media standard support has a few new bells and whistles that previous Maxwell cards lacked. Pascal now supports full encode and decode in both H.265 and 10-bit H.265. 12-bit (decode-only) is also supported, as is hardware decode for Google’s VP-9 codec.

Those of you who are familiar with multi-GPU configurations are also aware that Nvidia has previously used SLI bridges to connect one GPU to another. AMD abandoned this approach back in 2013 when it launched Hawaii; AMD GPUs now connect directly over the  

PCI Express 3.0 bus. Nvidia is still sticking with bridges for Pascal and GP104, but this time it’s introducing a new, higher-bandwidth bridge standard for modern GPUs. Existing bridges should function well up to 2560×1440 @ 60Hz, but if you want >60Hz refresh rates or to run SLI in 4K or 5K mode, you’ll see top performance if you use newer bridges (Nvidia did note that its LED bridges are still rated for anything up to 5K). It’s not entirely clear if older “stiff” bridges are limited the same way as the older “floppy” bridges (cross-GPU bandwidth was lower on the flexible bridges than on their “stiff” counterparts.)



This slide shows the difference in Shadows of Mordor between the old and new bridge. It’s important to note that this dramatic difference was captured in 4K Surround mode, which means three 4K displays running the same game for a total resolution of 15360×2160. While the new bridges are much smoother than the old ones, the game itself doesn’t maintain a playable frame rate at these resolutions and detail settings. Lower resolutions and detail settings might not show the same gains.

Asynchronous compute
Asynchronous compute has been a hotly debated topic ever since Ashes of the Singularity debuted and showed AMD holding an advantage over Nvidia, ostensibly due to this particular capability. While that situation is rather more nuanced and game-specific, there are going to be a number of questions regarding how Pascal stacks up to the competition.

According to Nvidia, GP104 improves on Maxwell in some significant ways. Maxwell was only capable of performing draw-level preemption and could only switch to a different workload at a draw call boundary. What this meant practically was that there were significant penalties to running a mixed compute + graphics workload, and we saw that reflected in Maxwell’s performance when significant asynchronous workloads were running.



Unlike Maxwell, Pascal can perform much finer-grained preemption. In graphics workloads, it can preempt at the pixel level, flush the shader pipeline, and switch to compute. In compute workloads it can swap at the instruction level and return to doing graphics work. Nvidia claims that this takes 100 microseconds or less, and while the company didn’t offer competing figures for Maxwell, it should be significantly faster than what we saw last generation.

Asynchronous compute isn’t a feature most games rely on yet (Ashes of the Singularity is something of an exception), and we can’t deep dive into the question until we’ve got hardware. What I can say is that while Pascal significantly improves on Maxwell’s capabilities, it doesn’t offer the same set of compute capabilities that GCN does. The larger question is whether or not the difference between what the two companies support will have an impact on future DirectX 12 titles. The fact that Nvidia holds an estimated 75-80% of the gaming market is itself a powerful argument that developers should focus on building engines that cater to Nvidia’s architectures and GPU capabilities more so than AMD’s. At the same time, however, some developers have predicted that game engines may shift workloads towards compute engines no matter what — and that could potentially work in AMD’s favor in future DX12 titles.

I’ve always recommended evaluating GPUs based on the games you’re playing now, not the titles you might be playing in 12-24 months, and it’s difficult to predict how game engines might change in the next few years. At minimum, the changes Nvidia has made to Pascal should significantly reduce any asynchronous compute penalty relative to Maxwell. At best, we should see Pascal picking up some performance improvements from async compute, including in games where Maxwell took a performance hit.




The graph above shows how far the GTX 1080 has come relative to its predecessor, but there’s one caveat worth mentioning. According to Oxide, asynchronous compute is disabled on Nvidi cards by default, which means these test results may not tell us if Pascal actually benefits from async compute just yet. One final note: When Nvidia demoed its async compute capability at Austin, it did so using DirectX 11, not DX12. We weren’t able to discover more information on why it chose to demo using the older API, or what the performance ramifications were for that scenario.

Wrapping it all up
Pascal is a significant leap forward for Nvidia, thanks to a combination of higher clocks, increased core counts, and improved efficiency. The company is forecasting significant gains over and above GTX 980 in both traditional gaming and VR, with particularly impressive boosts arriving for VR titles. While we can’t speak to that specifically just yet, the on-paper gains are substantial.



One difference about this launch, however, is that AMD and Nvidia are taking very different approaches to the market. Nvidia has chosen to launch high-end parts first, with the GTX 1080 and 1070 taking over for the 980 Ti, 980, and GTX 970. AMD, in contrast, will launch an efficiency-focused GPU first, with Polaris 10 and 11 targeting the budget and mainstream segments in both mobile and desktop. This is the first time in a long while that the two companies have taken this approach, and it’ll be interesting to see how they compare in their respective brackets. It’s still not clear if Pascal’s VR performance gains will require substantial optimization or not, but VR enthusiasts who held off buying a new GPU when Oculus and Vive launched should be well-rewarded for their patience.

Current reviews show the GTX 1080 outperforming the GTX 980 by 25-35%, which is in-line with our expectations. This launch is going to put serious pressure on AMD to reduce the price of its Fury products — the non-Founders variant of the 1080 will sell for $500, which puts it head-to-head against Fury and Nano.