Triangle throughputGiven the advances NVIDIA has made in terms of geometry processing, we obviously wanted to take a closer look at the subject. First of all we looked at triangle throughput in three different situations: when all triangles are drawn, when half the triangles are removed with back face culling (because they aren’t facing the camera) and when they're all removed:
The GeForce GTX 480 is very fast here and goes over one triangle per cycle. In terms of rejecting triangles via culling, no other GPU gets anywhere near it. The GeForce GTX 460 is close to one triangle drawn per cycle and is also very fast when it comes to removing triangles via culling.
We are, however, quite some distance from the theoretical maximums of 4 triangles per cycle for the GF100 and 2 triangles per cycle for the GF104. Something is limiting them but we don’t exactly know what. We do know however that this limitation doesn’t exist on the GF100 Quadro derivatives.
We then carried out a similar test but this time using tessellation. This test tool hasn’t yet been finalised and fully optimised to give the best yields. It can however already be used to compare the solutions amongst themselves:
The advantage of the GeForces over the Radeons is there for all to see. The Radeons seem to be limited to 1 triangle every 3 cycles when tessellation is used. AMD told us that this wasn’t always the case and that the tessellation unit was capable of outputting one triangle per cycle. This is something we haven’t yet managed to reproduce as the Radeons are very quickly left behind when too many triangles are generated. Note however that at 270 million triangles per second, you can already envisage some pretty complex scenes with the Radeons!
AMD and NVIDIA have very different approaches. While the Radeons all give identical performance here, the GeForces vary card by card. We have also noted enormous gains with the GeForces when the GPUs have to load several vertices per primitive. The GF100 and the GF104 continue to run at full speed when loading 2 or 3 while other GPUs see their speeds go into freefall here because they can only load one vertex per clock.
We tested tessellation with an AMD demo that is part of Microsoft’s DirectX SDK. This demo allows us to compare bump mapping, parallax occlusion mapping (the most advanced bump mapping technique used in gaming) and displacement mapping that uses tessellation.
Basic bump mapping.
Parallax occlusion mapping.
Displacement mapping with adaptive tessellation.
By creating true additional geometry, displacement mapping displays clearly superior quality. Here we activated the adaptive algorithm that allows you to avoid generation of useless geometry and too many small triangles that will not fill any quads and waste a lot of ressources.
We also measured performances obtained with the different techniques:
It is interesting to note that tessellation doesn’t only improve rendering quality but also performance! Parallax occlusion mapping is in fact very ressource heavy as it uses a complex algorithm that attempts to simulate geometry realistically. Unfortunately it generates a lot of aliasing and this is noticeable on the edges of objects or surfaces that use it.
Note however that in the present case the displacement mapping algorithm is helped by the fact that it is dealing with a flat surface. If it has to smooth geometry contours and apply displacement mapping at the same time the demands are of course much higher.
The GeForce GTX 400s do much better with tessellation load here than the Radeon HD 5000s. With extreme tessellation levels, the GeForce GTX 460 is almost twice as fast as the Radeon HD 5870 in this test. The use of an adaptive algorithm which regulates the level of tessellation acording to the areas that are more or less detailed, depending on distance or screen resolution gives significant gains across the board and is more representative of what developers will put into place. The gap between the GeForces and the Radeons is then reduced, but the GeForce GTX 400s retain a significant advantage.