Texture access performances
Performances were measured in the access of textures of different formats in bilinear and trilinear filtering. We kept the results in classic 32 bits (8x INT8), 64 bit "HDR" (4x FP16) and in 128 bits (4x FP32). For comparison, we added performances in 32 bit RGB9E5, a new HDR format introduced by DirectX 10, which enables storing HDR textures in 32 bits with a few compromises. These tests were carried out with a tool provided by our colleagues and friends at Beyond 3D.

First of all, with bilinear filtering you will notice the obvious difference between the GeForce 8800 Ultra and GeForce 9800 GTX. The latter is capable of filtering 32 bit textures twice as fast thanks to the presence of more address units. The GeForce GTX 280 is largely ahead of the GeForce 9800 GTX, while when looking at theoretical speeds, they are very close at a respective 43.2 GTexels/s and 48.2 GTexels/s. In other words, Nvidia has indeed improved the efficiency of its texturing units as we now go from 78% to 98%. Not bad.

Next, we move on to trilinear filtering and the second table. Here the doubling of texture address units is of no use although results are still very good. For this reason, we weren’t surprised by these performances. Note that the test didn’t give correct results for the Radeon HD 3870 but speeds are supposed to be more or less half of those of bilinear filtering.
ROP performances
The GeForce GTX 280 has 32 ROPs versus 24 for the GeForce 8800 Ultra and the 16 of the GeForce 9800 GTX. As a reminder, ROPs are units devoted to the last step in processing pixels (mixing colors, anti aliasing, compression and writing data to memory). The size of the memory bus is partly related to this increase.
You may remember that not just happy with increasing the quantity, Nvidia improved efficiency on the GeForce 8 for Z-only passes. AMD is very far behind in terms of speed in this area:

GeForces are very fast here, significantly more so than the Radeon HD 3870 – at least up to 4x anti-aliasing. In 8x mode, the Radeon HD 3870 has a similar speed while it is lower on the GeForce, probably due to a lack of memory bandwidth. The 512 bit bus of the GeForce GTX 280 enables it however to stay in the lead.
Next, again we use a tool provided by our colleagues at
Beyond 3D in order to test the speed of ROPs when writing pixels in memory first in a classic manner and then with a mix of colors (blending), notably used for transparency effects.

With exception to a lower than expected speed for the GeForce GTX 280 in FP32x1, results are logical and consistent with the number of ROPs. 64 bits is half as slow as 32 bits and 128 bits is in turn half this speed. As for 32 bit "FP10", it is handled in the same way as FP16 and, unfortunately, does not have a higher speed.

Once blending is used, we noticed a net gain for the GeForce GTX 280 which benefits from the implementation at full speed of this function.