HDRHDR as seen in Far Cry or Splinter Cell and which should be more widely used in the future, relies on 64 bit rendering, or 4 FP16 components to be more accurate. The scene, and some other elements, are rendered in a 64 bit buffer (RT, Render Target), and these parts are then processed (considered as textures) to assemble the final result. For all of this to work in a simple way, the GPU must support FP16 filtering and blending. This is the case of the GeForce 6 and 7 series, but not with the Radeon X800. Accessing FP16 textures might be a bit costly, but it is important to note that filtering isn’t. It requires transistors to be implemented in the GPU, but doesn’t reduce performance in practice. FP16 blending consumes a lot of memory bandwidth, 2 times as much as normal rendering.
We measured performances with several combinations of textures, RT and with or without blending:
In 32 bits with blending, the GeForce 7800 GTX’s performance increased by 20% whereas the bandwidth increase is less than 10%. Blending seems therefore more efficient, probably because it helps to do it less often when it’s required. With the 64 bit RT, performances are very close, and here the limitation is clearly due to memory bandwidth. Once 64 bits textures are also used, the 7800 performances strongly increase. Is it due to the additional pipelines? This is what we try to determine with the next test.
How do the different graphic cards perform with different textre formats and sizes?
With standard 32 bit textures, graphic cards perform similarily and fillrate is progressively reduced. The two NVIDIA graphic cards are slightly ahead of the Radeon and can access 4096x4096 textures, whereas the Radeon is restricted to 2048x2048. In practice this isn’t a bothersome limitation as no textures of this size are used.
In FP16, results are much different. The two NVIDIA graphic cards filter the texture, but this has a negligible effect on overall performances. If the Radeon fillrate is progressively reduced, the fillrate of the two GeForce stops at 512x512 textures and that is probably where the texture cache efficiency limit is. With smaller textures, the GeForce 7800 uses the additional pixel shading pipelines to be 50% more efficient than the 6800. With huge textures, the GeForce 7800 GTX is 2.7 times more efficient than the previous model. It shows that the texture cache has been improved and is more comfortable under these conditions. The Radeon remains two times more efficient than the best GeForce with 2048x2048 textures.
In FP32 results are close to those of FP16’s, except that the 7800’s additional pipelines are no longer useful because of the memory bandwidth limitation. The Radeon is once more time in first position with 2048x2048 textures and the GeForce 7800 is a lot more efficient than the 6800.
Texture upload and download
These tests were made with Serious Magic (D3D) for download from the graphic card and Texbench (OGL) for upload to the graphic card. When the graphic card data is recovered, the two GeForces are 30% faster than the Radeon. This is quite surprising as the 6800 uses a PCI Express / AGP bridge and we would assume it would be slightly behind.
For download, the situation is the opposite with the Radeon dominating. The 6800 Ultra is slightly behind because of the bridge used.
Here we have fillrate results in Texbench with a scene rendered with different quantities of textures from 160 to 320 MB. The 6800 Ultra 512 MB is in the lead thanks to its extra memory, which gives the possibility to store more textures locally. As usual with this test, ATI graphic cards are slightly behind for an unknown reason. The 7800 GTX uses the native support of PCI Express to clearly beat the GeForce 6800 Ultra with 240+ MB of textures.