Product review: The Nvidia GeForce GTX 280 & 260 - BeHardware
>> Graphic cards
Written by Damien Triolet
Published on June 16, 2008
URL: http://www.behardware.com/art/lire/723/
Page 1
Introduction
Finally! After a few very long months, a new "big" GPU has arrived. Nvidia’s GT200 pushes GeForce 8 architecture further and promises to replace the GeForce 9800 GX2 by a card equipped with a single GPU.An enormous GPU: not easy Jen Hsun Huang has regularly reaffirmed Nvidia pledge in the continuing development of high end monolithic graphic solutions, or in other words, cards equipped with a single large GPU and not 2 more modest chips. However, this is a discourse which seems to be in contradiction with the GeForce 9800 GX2, Radeon HD 3870 X2, etc. While this strategy appears to be suited to AMD which unfortunately only has limited resources, Nvidia sees things differently.
 Jen Hsun Huang, Nvidia’s CEO, isn't afraid of financing the development of enormous GPUs. A big GPU is profitable for Nvidia, first of all, because the GeForce brand has the wind in its sails and is selling well. Also, Nvidia can use the new GPUs on other very profitable markets with the Quadro and (if everything goes well) with Tesla cards – or at least this is Nvidia’s wager. For these reasons, why should they deny themselves?
The GT200 is thus an example of immoderation and is in fact the biggest GPU ever made with its 1.4 billion transistors engraved in 65 nanometers. Moreover, it’s a challenge for TSMC. The size of the chip should be around an unprecedented 600 mm², while the previous monster in terms of sheer size was the G80 of the first GeForce 8s. It wasn’t far from 500 mm².
 The GT200’s die is enormous. Developing such a GPU isn't easy and for this reason there were delays. Jen Hsun Huang having indirectly announced it for late 2007 means a 6 month delay before the card is ready to be commercialized. Therefore the GT200 isn't a reaction to AMD’s RV770 as some might have said, even if it comes at the exact time to prevent AMD from taking the lead. The GT200’s delay can partly explain the disorder in Nvidia’s GeForce 8 and 9 lines as it seems evident that’s its absence posed a problem.
For this reason, Nvidia intends on brushing the slate clean of the names of its previous GPUs. Moreover, the GT200’s codename changed several times (and you will notice that ‘’G200’’ is written on the chip and no it’s not a Matrox chip). Graphic cards that integrate it will be named the GeForce GTX 200, starting with the 280 and 260 in the beginning.
So, what can be done with 1.4 billion transistors and GeForce 8 based architecture?
Page 2
Architecture: SIMT vs. MIMD, GeForce 8 and 9SIMT vs. SIMD vs. MIMD With the GeForce 8, Nvidia introduced architecture that made a complete break from the past. Thus, exit the enormous MIMD vector units out of which it was sometimes difficult to extract the maximum. Instead the choice was in favor of scalar units. While on the implementation level this involved 256 bit wide (8 x 32 bits) SIMD units (like SSE), on the functional level it's not 8 32bit operations instruction that is applied to 1 thread/element per cycle but rather one 32 bit operation on 8 threads/elements. For this reason, in practice and from the outside these units behave as scalar units.
To highlight the difference with SIMD (Single Instruction Multiple Data), Nvidia speaks of SIMT (Single Instruction Multiple Threads). And while units are similar, the SIMT enables to naturally maximize the use of units if the task is massively parallel as it happens to be in 3D rendering. The interest of SIMT is that the programmer doesn’t have to do anything in order for this to be the case, while in SIMD the programmer and compiler have to strive to fill the vector unit which isn’t always that easy. MIMD (Multiple Instructions Multiple Data) as it is used by AMD in the Radeon HD 2000/3000 suffers from the same problem even if it is more flexible.
Of course, the SIMT isn't the ultimate solution because there are always compromises. It is more efficient but also uses more transistors, surface on the chip and power consumption higher because a more complex logic control is required. On the other hand, the SIMD and MIMD enable placing more math units in the GPU though this is to the detriment of efficiency.
The GeForce 8 was thus created with only 128 scalar units while the Radeon HD 3870 has 64 vec5 units or the equivalent of 320 scalar units. The higher efficiency of SIMT of course is not sufficient to compensate for this difference. On the other hand, Nvidia has managed to implement double pumped-type math units, or in other words, those that run at double speed compared to the scheduler. This largely compensates; however, to the expense of branching efficiency (which we will address in its own specific section). GeForce 8/9 and GTX 200 architecture The GeForce 8 and 9 are equipped with a certain number of blocks of processing units or partitions. Each of these partitions have 2 multi-processors and one texturing units block. The multiprocessor is composed of a scheduler, a 8192 32bit register space, a SIMT unit composed of 8 FMAD-type scalar processors (floating point multiplication and addition in a single cycle) and 2 units (SFU) dedicated to special functions such as sin, cos, log, etc. (which are therefore 4x slower than simple instructions). These two units can also function like an extra SIMT processor composed of 8 FMUL type scalar processors.
The two groups of units (FMAD and SFU/FMUL) can work in parallel because the GeForce 8/9 have dual issue support. On the other hand, there seems to be some limitations and it is not always easy to use FMAD and FMUL units at the same time. In fact, it is difficult for the compiler and scheduler to use them simultaneously because independent instructions would be needed as well as simultaneous access to all the required registers. For this reason, most of the time, it’s the FMAD unit which handles FMUL instructions and only special functions are executed in dual issue. Given the above we should add that this has evolved with new drivers.
On the texturing units level, in the beginning there were 4 address units and 8 filtering units per partition therefore capable of processing 4 texels in bilinear, trilinear or 2x anisotropic filtering. However, after the G80, Nvidia slightly updated its chips by adding 4 more address units. For this reason, the partitions of succeeding GeForce 8 and 9s are capable of processing 8 texels in bilinear filtering or 4 texels in trilinear or 2x anisotropic filtering.
Page 3
Architecture: the GeForce GTX 200The GeForce GTX 200 With the GT200 that equips the GeForce GTX 200, Nvidia of course fixed an objective of offering a higher performance GPU. So what could be done based on GeForce 8 architecture? Put two G80s on the same chip for a total of 256 scalar processors and 128 texturing units? Sounds simple, right?
It’s never that easy. Doubling what we have rarely results in doubled performances. This is all the more so true given that inconveniences such as power consumption and the accompanying heat created can be such that frequencies can be reduced and therefore performance gains too.
So Nvidia first of all wanted to know what was the limiting factor on the GeForce 8/9. And then they tried to guess what it would be in the future. The conclusion evidently was that more calculation power and registers were needed and that the number of texturing units didn’t necessarily have to increase much.
 The GT200’s partitions received an additional multiprocessor compared to the GeForce 8 and 9 bringing their total number to 3.
For this reason, Nvidia added one multiprocessor per partition which now contain 3. The number of registers of each multiprocessor was also doubled to finally attain 16,384. The more registers there are implies that the compiler is more flexible to produce a series of optimal instructions and that the GPU can more efficiently mask the various latencies, for example, in the access to textures. You may recall, the GPU handles a very large number of threads (pixels, vertices, etc.) to mask latency and keep execution units busy. The data of these threads should stay in the registers. Next, Nvidia increased the number of partitions from 8 to 10 for a total of 240 scalar processors. In terms of general registers on the entire GPU we go from 131,072 to 393,216 x 32 bits !
 The GT200’s architecture. A supplementary unit was placed in the GT200’s multiprocessors: a 64 bit FMAD. This unit enables the GT200 to support 64bit floating point calculations. Given that there is a single unit, the speed is an eighth of that compared to a SIMT unit composed of eight 32 bit scalar processors. In addition, in 64 bits two 32 bit registers should be used, limiting performances a bit more. This support is therefore destined not to be the most efficient possible but is first and foremost there for developers who need it with CUDA.
There were no changes in texturing units which simply benefit from the transition to 10 partitions. On the other hand, Nvidia says that it has improved the scheduler and a few other details in order to maximize the use of these units.
Moreover, small changes of this nature were numerous. Dual issue was improved and it is now easier to use FMADs and FMULs in parallel. ROPs are now capable of blending at full speed with 32 bit (4x 8 bits) formats while it was done at half speed before.
The output buffer of geometry shaders was enlarged and is now six times bigger. You may recall that this was one of the weak points of the GeForce 8 and 9 whose performances plummeted when a geometry shader was used to create a significant amount of geometry, for example, in tessellation.
Just like the Radeon HD 2000 and 3000, the GT200 has a processor dedicated to management of PCI Express transfers. It can therefore send and receive data at the same time it is working on 3D rendering or some program with CUDA.
Finally, the memory bus was extended to 512 bits with eight 64 bit controllers, something that enables giving the GPU significant bandwidth without having to use very expensive memory.
On the other hand, where Nvidia hasn't made any innovation is in its insistence on not supporting DirectX 10.1. As we explained on several occasions, there is a strategy in this choice consisting of not lessening the value of its other GPUs compared to the competition. Another aspect is that while Nvidia supports some parts of DirectX 10.1, such as direct access to depth buffers when anti-aliasing is used and helps developers to circumvent DirectX 10 and 9 to access them, other points require more significant changes. For example, Nvidia does not have programmable grids for the position of samples in multisampling, something which is necessary to support DirectX 10.1 and which would require reviewing in depth the antialiasing part of its GPUs.
Page 4
Pixel, Vertex and Geometry Shader performancesPixel Shader performances We tested two relatively simple lighting shaders which represent a good compromise between theoretical and actual speeds:
 The GeForce GTX 280 is 40 to 50% faster here.
Of course, we wanted to know more about performances in much more complex situations. Our tests mostly involved analysing dual issue and we were thus able to see that it was indeed more easily usable. For example, we noted a throughput of 1.5 FMUL per cycle which means that FMAD and FMUL units function well in parallel. On the other hand, the latter seem to be only used in 1 out of 2 cycles and we weren’t sure why. Perhaps, this was a limitation related to register access. The maximum calculation power we obtained with a shader composed of 2,000 FMADs and 2,000 FMULs was 664.4 Gflops or 70% of the maximum announced calculation power of 933.12 Gflops.Vertex Shader performances We tested performances in T&L, VS 1.1, VS 2.0 and VS 3.0 in RightMark :
 Unified architecture enables recent GPUs to allocate all resources to the processing of vertex shaders which can mean a significant gain. Moreover, the gain could be even bigger but it is limited by the GPU’s triangle processing speed which on all the GeForces tested here is 1 triangle per cycle. On the other hand, this is 0.5 per cycle for the Radeon HD 3870 while it was 1 per cycle for the Radeon HD 2900 XT. The higher frequency of the GeForce 9800 GTX therefore is to its advantage making this GPU the most powerful we have seen in terms of (simple) geometrical processing.Geometry Shader performances Contrary to Nvidia, AMD has integrated a generalized cache for reads/writes in memory from the shader core. It can be used in a classic manner for Stream Output which consists, as required by DirectX 10, of being able to write data that comes out of the shader core without having to go through the ROPs. It also enables the virtualization of general registers which can thus be unlimited.
Another use is to utilize video memory via this cache to temporarily stock a potentially enormous mass of data created by Geometry Shaders during the amplification of geometry without which calculation units could be blocked due to the impossibility of placing the result in general registers. This could otherwise cause problems and theoretically a crash because geometry data should remain in the correct order. For example, imagine triangles 1 and 2 being deconstructed. Triangle 1 should be rendered before triangle 2. Given that a GPU is in parallel, these two 2 triangles could be processed at the same time by the geometry shader which will deconstruct them into a series of smaller triangles. At the output, all the pixels stemming from triangle 1 should be rendered before the others. This is fine if there is enough memory to store everything. We only have to wait for all to be finished and check that the rendering was done in the correct order. However, if the GPU falls short of memory when triangles 1 and 2 are still being deconstructed, it is stuck.
AMD thus avoids this problem. While Nvidia should of course also avoid it but Nvidia’s approach is very different. They take the problem from the other end and instead of providing more memory to stock data before putting things back in order, reduce the number of elements processed in parallel to always have enough memory in the GPU. In other words, instead of using 128 or 240 processors to process a geometry shader, if Nvidia detects that there could be a problem, this number is reduced. We do not know exactly at what point Nvidia should reduce parallel processing, but it obviously seems to be a very big difference between Nvidia and AMD, with an advantage for the latter. This is true even if developers are careful not to use it in problematic cases.
To compensate for this, Nvidia has strongly increased the size of its cache in the output of geometry shaders by six times the amount. What does it mean in practice? We observed performances in a tessellation demo based on geometry shaders provided by AMD at the launch of the Radeon HD 2900 XT :
 As you can see, even if the Radeon HDs hold their edge, the GeForce GTX 200 significantly improves performances compared to the GeForce 8 and 9. Nvidia says that it has increased cache in relation to what developers use and will use in the mid-term.
Page 5
Texturing and ROP performances
Texture access performances Performances were measured in the access of textures of different formats in bilinear and trilinear filtering. We kept the results in classic 32 bits (8x INT8), 64 bit "HDR" (4x FP16) and in 128 bits (4x FP32). For comparison, we added performances in 32 bit RGB9E5, a new HDR format introduced by DirectX 10, which enables storing HDR textures in 32 bits with a few compromises. These tests were carried out with a tool provided by our colleagues and friends at Beyond 3D.
 First of all, with bilinear filtering you will notice the obvious difference between the GeForce 8800 Ultra and GeForce 9800 GTX. The latter is capable of filtering 32 bit textures twice as fast thanks to the presence of more address units. The GeForce GTX 280 is largely ahead of the GeForce 9800 GTX, while when looking at theoretical speeds, they are very close at a respective 43.2 GTexels/s and 48.2 GTexels/s. In other words, Nvidia has indeed improved the efficiency of its texturing units as we now go from 78% to 98%. Not bad.
 Next, we move on to trilinear filtering and the second table. Here the doubling of texture address units is of no use although results are still very good. For this reason, we weren’t surprised by these performances. Note that the test didn’t give correct results for the Radeon HD 3870 but speeds are supposed to be more or less half of those of bilinear filtering.ROP performances The GeForce GTX 280 has 32 ROPs versus 24 for the GeForce 8800 Ultra and the 16 of the GeForce 9800 GTX. As a reminder, ROPs are units devoted to the last step in processing pixels (mixing colors, anti aliasing, compression and writing data to memory). The size of the memory bus is partly related to this increase.
You may remember that not just happy with increasing the quantity, Nvidia improved efficiency on the GeForce 8 for Z-only passes. AMD is very far behind in terms of speed in this area:
 GeForces are very fast here, significantly more so than the Radeon HD 3870 – at least up to 4x anti-aliasing. In 8x mode, the Radeon HD 3870 has a similar speed while it is lower on the GeForce, probably due to a lack of memory bandwidth. The 512 bit bus of the GeForce GTX 280 enables it however to stay in the lead.
Next, again we use a tool provided by our colleagues at Beyond 3D in order to test the speed of ROPs when writing pixels in memory first in a classic manner and then with a mix of colors (blending), notably used for transparency effects.
 With exception to a lower than expected speed for the GeForce GTX 280 in FP32x1, results are logical and consistent with the number of ROPs. 64 bits is half as slow as 32 bits and 128 bits is in turn half this speed. As for 32 bit "FP10", it is handled in the same way as FP16 and, unfortunately, does not have a higher speed.
 Once blending is used, we noticed a net gain for the GeForce GTX 280 which benefits from the implementation at full speed of this function.
Page 6
Branching performancesBranching performances One of the main innovations that was introduced with the evolution of GPU programmability was dynamic branching. This allowed writing some shaders more easily and to increase the efficiency of others by avoiding the calculations on parts that don’t need it. For example, why apply a very performance costly filter to soften the border of a shadow to a pixel in the middle of the shadow? Dynamic branching can help to determine if the pixel needs that or not.
 However, the situation is not that rosy as this only applies to very specific cases. Branching has the reputation of being difficult to manage and this is particularly the case in CPUs that have to predict the branching result to mask calculation latency. In a GPU, pixels are processed by groups of 10s, 100s or even 1000s, and this allows the automatic masking of this latency. This problem, therefore, doesn’t really exist for GPUs. There is another one, however. For efficient branching with GPUs, all pixels of a working group have to take the same branch or else both branches have to be calculated for all pixels with masks in order to only write the result of the required branch for each pixel.
In the case of the GeForce 8, 9 and GTX 200, the GPU works on groups of 16 or 32 threads (vertices, pixels, etc.). Why these two possibilities? First of all because they are 8-way SIMT units which require groups of at least 8 threads. Next, you may recall that calculation units are double pumped and function at twice the scheduler’s frequency. Thus, only one command can be sent in every other cycle when seen from the calculation units’ point of view. Working on groups of 16 threads enables calculation units to have enough work and to not have to wait for a slower scheduler. Finally, working on 32 threads authorizes dual issue. Alternatively, the scheduler will send an instruction to the 8-way SIMT unit and then it will send an instruction to special units. It can alternate between these two operations at full speed thanks to groups of 32 threads.
Nvidia can configure its GPUs for 16 or 32 threads. In the first case, branching performances are improved and in the second calculation power is improved thanks to dual issue. Groups of 16 are activated for vertex and geometry shaders while groups of 32 are activated for pixel shaders and CUDA.
We developed a small test that allows us to change branching granularity (the number of consecutive pixels that take the same branch). We create virtual screen columns inside the pixel shader applied to moving triangles. We specify the branch to take per pixel column. One column out of two has to display a complex shader while the other can skip this part of rendering. Average sized triangles in motion are displayed on the monitor and across these virtual areas that use different branches. The triangle size, their position and the column size have an influence on branching efficiency. We think this test is quite close to real situations.
 With narrow columns, GPUs can’t use branching to avoid the complex part for half of the pixels, but they do have to process branching instructions. This reduces performances instead of increasing them - at least for the GeForce 8, 9 and GTX 280. All of these GPUs have a special unit devoted to branching, which functions in parallel with pixel shading and texturing pipelines, masking the cost of branching instructions. The Radeon HD 3870, however, seems to be the only one to completely mask branching latency.
The size of groups of pixels on the GeForce 8800 is 32 versus 64 for the Radeon HD 3870, which enables Nvidia chips to take the lead. We noted a surprising difference between the GeForce 9800 GTX and GeForce GTX 280 which with one column of 8 pixels is much more efficient. It is probable that the breaking down of triangles into pixels is done in a way that it best groups close pixels together (and thus they are more susceptible to take the same branch) which is beneficial in this case.
Page 7
Specifications, the cardsSpecifications
 Note once again that the dual-GPU cards tested here are the equivalent of a single 512 MB card and not a 1 GB model like the GeForce GTX 280 !
The GeForce GTX 260 and 280 differ mainly by the number of active partitions. While the entire GPU is functional on the GTX 280, only 8 partitions out of 10 are active for the 260. The memory bus has been reduced to 448 bits with 28 ROPs versus 512 bits and 32 ROPs for the new ultra high end. The difference of price between the two cards is noticeable and while the GeForce GTX 280 was launched at a classic 550€, Nvidia announces 310€ for the GeForce GTX 260. Is there a good deal on the horizon?The cards For this test we received a reference GeForce GTX 280 and 260 from Nvidia. These cards have an entirely closed design like the GeForce 9800 GX2 although the cooling system is similar to that of the GeForce 9800 GTX’s. A black or white Nvidia logo enables respectively differentiating the GeForce GTX 280 and 260 which have a size similar to other high end cards.

Another small external difference is the power connectors. The GeForce GTX 260 has two 6 pins versus one 6 and one 8 pin for the GeForce GTX 280. So why didn’t they leave the 8 pin connector on the GeForce GTX 260 despite everything in order that it can be used by itself!?

 
  You will notice the presence of a chip dedicated to the control of video outputs, the NVIO2, as it was the case for the GeForce 8800 GTS, GTX and Ultra.
 Given that the GeForce GTX 260’s memory bus is smaller at 448 bits, a (32 bit) memory chips disappear from each side of the PCB. Note that these are Hynix 0.8 ns chips on both cards.
Page 8
Power consumption, the testPower consumption and noise We evaluated the power consumption of the different cards. Measurements were taken at the wall socket. This is therefore the total power consumption of the power supply, in this case a Cooler Master Real Power M1000 (1000 watt).
 Use of the 55 nanometer process and PowerPlay in order to reduce consumption means the Radeon HD 3870 as well as the X2 are very economical in stand-by; however, in load PowerPlay no longer gives the Radeons this advantage.
The GeForce GTX 260 and 280 now use similar technology and are now very economical at rest. Of course, the size of the GPU and 65 nanometer process means the Radeons keep an advantage but Nvidia has improved in this area.
In load, a GeForce GTX 260’s consumption is close to that of a GeForce 9800 GTX’s; however, it is more powerful and has the same fabrication process. Not bad. Otherwise, the GeForce GTX 280 gobbles up 50 watts less than a GeForce 9800 GX2 with our system.
Note that the GeForce GTX 200, just like the GeForce 9800 GTX and GX2, are Hybrid Power compatible. Therefore they can be entirely shut down once used in a system compatible with this technology such as the future nForce 780a for AMD processors. We will have to wait until this summer to see a similar capability for the Core 2.
In terms of sound levels, while the GeForce GTX 260 and 280 are in the silent category, the GeForce 8800 Ultra is still the reference in this domain.The test In this test, we used ten games, four of which support DirectX 10. Tests were carried out only in 1920x1200 as a lower resolution isn’t generally suited for such a high end product. Anisotropic filtering, DirectX 10, and HDR were activated in all cases when available in the game. Finally, transparency/adaptive anti-aliasing were activated in multisampling mode.
All available Windows Vista currently available in addition to SP1 were installed.
Configuration Intel Core 2 Extreme QX9770 Asus Striker II 4 GB DDR3 1066 Windows Vista SP1 Forceware 177.34 Catalyst 8.5
Page 9
HD video, Folding@HomeHD video We didn’t test HD video encoding performances on these cards because there is nothing new. The GT200 integrates the VP2 which is already found in the latest Nvidia products. You may recall it has complete support of h.264 video encoding but only partial support of VC-1 format videos.
We should specify that Nvidia has added an HD video profile to its energy savings system. The GPU therefore does not run at full speed when reading HD videos and limits itself to the required minimum in order to reduce consumption and guarantee the necessary performances.
For this test, Nvidia provided us with a beta version of Badaboom, a video conversion tool, which uses the GPU via CUDA for extremely fast processing. We didn’t include any performance tests based on this program as it is still limited to its current beta version. On the other hand, there is no doubt that it is very fast and promising.Folding@Home
 Finally! The Folding@Home GPU client supports Nvidia GPUs Nvidia from the GeForce 8 onward. This support is via CUDA and while the client is not yet made public, it shouldn’t be long. We tested a beta version by comparing the results obtained with the Radeon HD 3870 with its own client and Catalyst 8.3 :
 The GeForce has largely superior performances. However, it is difficult to say if this is related to its architecture or CUDA which facilitates the proper use of the GPU. Most likely, it’s a bit of both.
Page 10
Enemy Territory : Quake Wars et Half Life 2 Episode 2Enemy Territory : Quake Wars
 While Quake Wars is based on the Doom 3 engine, it has undergone some evolution such as megatexturing which facilitates the work of artists; however, there is the additional cost in terms of decoding and access to megatextures. In the end, Quake Wars is a little more resource heavy than Doom 3 or Quake 4.
We saved a demo in a sequence versus 4 bots. Given that artificial intelligence was not calculated in the timedemo, results were less affected by the CPU than in actual gameplay or at least in this case versus our bot adversaries.
All parameters were set to a maximum in the game including 16x anisotropic filtering. The patch 1.2 was used.
 In this first game test, the GeForce GTX 280 places equivalent to the GeForce 9800 GX2 and is more or less 30% faster than the GeForce 8800 Ultra or Radeon HD 3870 X2.Half Life 2 Episode 2  Still based on the Source Engine, Half Life 2 Episode 2 doesn’t really have anything new on the technological level. It simply optimizes and more heavily relies on the engine’s capabilities, making the game more resource heavy than its previous versions. We carry out a demo with all game options set to a maximum including anisotropic filtering which is in 16x.
 In Half Life 2 Episode 2, it’s the GeForce GTX 260 which is equal to the GeForce 9800 GX2 with the GTX 280 being slightly ahead. Without anti-aliasing, the gain with the new cards is modest compared to the GeForce 8800 Ultra and 9800 GTX, probably because the Half Life 2 Episode 2 engine relies on a significant amount of texture access.
Page 11
S.T.A.L.K.E.R. et Rainbow Six : VegasS.T.A.L.K.E.R.
 We carry out an identical movement and measure the framerate with fraps. The test was done in high quality, complete dynamic lighting, maximum details (anisotropic filtering 16x) and foliage shadows. S.T.A.L.K.E.R. uses an engine based on differed rendering, which is fundamentally incompatible with MSAA and makes the use of anti-aliasing impossible – or at least this is what we thought! Despite everything, Nvidia ended up finding a solution. The 1.00006 patch was used
 Here, the GeForce GTX 280 is surpassed by the GeForce 9800 GX2 and places close to the Radeon HD 3870 X2 as multi-GPUs are obviously rather efficient for AMD in this game. However, the GeForce GTX 280 does 40% better than the GeForce 9800 GTX and 70% better than the GeForce 8800 Ultra. Not bad!
Once anti-aliasing is activated, cards equipped with only 512 MB are left behind. Moreover, the Radeons are not in the fight as AMD didn't make the effort to integrate its support here.Rainbow Six : Vegas  The first PC game based on the Unreal Engine 3.0, Rainbow Six : Vegas is still a very resource heavy game. We measure performances in the introductory scene. The HDR mode is activated as it is more or less obligatory as without it banding is very noticeable. Shadows are set to “low” because a higher quality in this domain lowers performance too much in certain areas.
 Originally designed for the Xbox 360, this game seems to have a natural affinity for the Radeon HD which has a similar architecture to the game console’s graphic chip. It’s therefore the HD 3870 X2 which dominates even when compared to the new GeForces – at least without anti-aliasing. With this effect activated, the Radeons suffer from a big drop in performances.
You may recall that this game does not support anti-aliasing but Nvidia and AMD have implemented it to their drivers.
Page 12
Oblivion and RaceDriver GRIDOblivion
 We saved a specific movement in order for it to be always identical and the test reproducible. Of course, HDR was activated and a high level of detail was selected.
 In Oblivion, dual GPU cards do rather well, especially the Radeon HD 3870 X2 which has very good performances with anti-aliasing. While the GeForce GTX 280 manages to do 40% better than the GeForce 8800 Ultra and 9800 GTX with anti-aliasing, the card is CPU limited without this filter.RaceDriver GRID  To test Codemaster’s latest opus, we carry out a well defined movement in high quality mode. The game is based on an evolution of the engine in Colin McRae DIRT, doing away with some of the unnecessary complexity. The patch 1.1 was applied.
 The GeForce 9800 GX2 finishes first here although the GeForce GTX 280 isn't too far off. The Radeon HD 3870 are left behind, a phenomenon amplified by the fact that multi-GPUs do not seem to be functional on the X2.
Page 13
Bioshock and Company of HeroesBioshock
 The first game based on the Unreal Engine 3.0 to support DirectX 10, Bioshock has great graphics even in DirectX 9 mode while it is less resource heavy than Rainbow Six : Vegas. We carry out a well defined sequence of movement with all options pushed to a maximum and in DirectX 10.
 Without anti-aliasing, the GeForce 9800 GX2 largely dominates as the UE3 is at ease with multi-GPU systems. On the other hand, with anti-aliasing it is equivalent to the GeForce GTX 280. Like in many other games, the GeForce GTX 260 is more or less 15% ahead of previous mono GPU GeForces.
Note that it is not yet possible to activate anti-aliasing in DirectX 10 mode for AMD.Company of Heroes  Given that Company of Heroes received a DirectX 10 patch that adds a real plus on the graphics level, we decided to add it to our test protocol. All graphic settings were pushed to a maximum.
We run the integrated test on version 1.72.
 In this game in DirectX 10 mode calculation power is very important and for this reason, the GeForce GTXs can give it their all. Even the GeForce 9800 GX2 cannot compete.
Page 14
World in Conflict and CrysisWorld in Conflict
 Very resource heavy and with nice graphics, it’s only natural World in Conflict joins our test suite. We carry out the internal test with the patch 1.0002. All game options are pushed to a maximum which includes the DirectX 10 mode and 16x anisotropic filtering.
 In this game where Radeons hardly shine, the GeForce GTX benefit from their surplus of memory especially with anti-aliasing activated. The game is actually very demanding in terms of memory which poses a problem for the GeForce 9800 GX2 once anti-aliasing is activated.
Crysis  An absolute must in terms of gaming, Crysis was tested with its 1.21 patch (optimized for multi-GPU systems). We carry out our own demo saved in ‘’Harbor’’, "High" mode and DirectX 10.
 While the success of Crysis has been rather mixed, it is currently the most resource heavy game and requires the most graphic power. Here, the GeForce GTX 280 does better than the GeForce 9800 GX2, especially with anti-aliasing, because the latter does not have enough memory. The GeForce GTX 260 also does very well. While the game is playable in 1920x1200 in High with the GTX 280, it is despite everything still limited in heavy scenes. Compared to previous GPUs, performances improve almost 50%, which of course is already quite good.
Page 15
Recap of performancesRecap Although individual game results are interesting, especially when involving multi GPU systems, we calculated a performance index based on all tests with the same weight for each game. A score of 100 was given to the GeForce 9800 GTX in 1920x1200.
 The index is a fine representation of the results obtained in the numerous games we tested. The GeForce GTX 280 is on average 40% faster than the GeForce 8800 Ultra and 9800 GTX without anti-aliasing or equivalent to the GeForce 9800 GX2. Otherwise, the GeForce GTX 260 is equal to the Radeon HD 3870 X2 and therefore a little more than 15% ahead of previous GeForces.
With anti-aliasing, the gaps are bigger and the new arrivals prove to be very at ease thanks to their large sized memory amongst other things. The GeForce GTX 280 thus does 70% better than the GeForce 9800 GTX and 15% better than the GeForce 9800 GX2 which is neck and neck with the GeForce GTX 260.
Note that (only) for anti-aliasing indexes, results obtained in Bioshock and S.T.A.L.K.E.R. were not taken into account as the Radeons do not have support for this filter in these games. You can consult a graph which otherwise takes these games into account here.
Page 16
ConclusionConclusion With the launch of the GeForce GTX 200, Nvidia reaffirms its leadership in the high end and its intention of continuing to produce large GPUs. The GT200 found in these cards integrates no less than 1.4 billion transistors.
By slightly touching up GeForce 8 architecture in order to improve all of the small problematic details or others that could soon become a limitation, Nvidia can now replace the GeForce 9800 GX2 with a card equipped with a single chip: the GeForce GTX 280. Of course in some cases it is surpassed by the previous high end, but in others, it is ahead with better consistency because it does not suffer from the weaknesses of the multi-GPU system.
It also benefits from increased local memory which finally attains 1 GB. This is useful in certain complex situation which should become more and more common. For a price of 550€, this card is without rival and therefore ought to find its place in the machines of extreme performance fanatics. In turn, they can be combined in SLI or triple SLI for the more well-off.
 Its smaller sibling, the Geforce GTX 260 is a notch below in terms of performances, but it is still ahead of the GeForce 9800 GTX. Its price fixed at 309€ is particularly attractive and is just a taste of the competition to come with the Radeon HD 4870.
You may have gathered that while the GT200 found in these cards is late, the base architecture is the same, increasing performances by 50% in almost 2 years is nothing extraordinary and Nvidia remaining intent on ignoring DirectX 10.1 is annoying, we still find this GPU attractive. Indeed, it seems well balanced and ready for the future that should be an exciting one in the realm of graphic cards. As proof of this, there is the arrival of softwares that are capable of benefiting from CUDA such as Badaboom, to encode video and Folding@Home as well as the PhysX API in its GPU version.
Copyright © 1997-2009 BeHardware. All rights reserved.
|