Review: NVIDIA GeForce GTX 680 - BeHardware
>> Graphics cards
Written by Damien Triolet
Published on March 22, 2012
After three months of Radeon domination, NVIDIA owed it to itself to react! It has now done so with the GeForce GTX 680, introducing the first Kepler generation GPU and looking to surpass the Radeon HD 7970 without sending power consumption through the roof.
GK100? GK104? GK110? Whereís the big one?AMD and NVIDIA usually launch the biggest GPU of a new family first. This GPU then serves as the flagship release for the rest of the range. This launch, however, is an exception!
Weíll probably never know exactly why this has happened as NVIDIA isnít in the habit of speaking about products that havenít been announced but which have been cancelled or delayed, but it does look very much as if the GeForce designer was forced by some issue or another to review its initial plan and cancel or delay the release of the biggest GPU in the Kepler family, which should now see the light of day later in the year under the codename GK110.
We imagine that NVIDIA decided to be pragmatic about the situation and not make the same mistakes it made with Fermi, a generation on which the release of the biggest GPU, the GF100, caused quite a few problems. If they were having issues with the big GPU, it no doubt made more sense to focus on a smaller one, especially as this particular one seemed so promising. NVIDIA therefore decided to focus first on the GK104, successor to the GF114 used on the GeForce GTX 560 Ti.
The GK104 and its 3.5 billion transistors.
Does this mean NVIDIA has had to review its ambitions to take back top spot in terms of performance? This is probably what the company thought at first but after seeing how the Radeon HD 7970 performed, it will have become clear that although giving a very good showing, the Radeon didn't represent as big a jump on the previous generation as all that, with AMD rather focussing its resources on laying the foundations of an architecture looking towards the future, notably in terms of GPU Computing. NVIDIA will have thought that the HD 7970 could therefore be in reach of the GK104, designed simply to offer optimal efficiency in video games.
The company then presumably worked hand over fist to try to offer a GeForce GTX 680 based on a full and highly clocked version of this GPU. Along with additional work to optimise drivers, it would aim to put the Radeon HD 7970 under pressure, making for a tight battle in prospect!
GK104: Fermi on a diet
GK104: Fermi on a dietNVIDIA didnít start from zero on the Kepler generation because good, made to last foundations had already been laid with Fermi. Kepler is in fact a small development of Fermi architecture designed with the aim of correcting Fermiís major weak point: the rather poor energy yield. This hasnít so much been done for ecological reasons as to avoid slamming into a brick wall. Keeping Fermi architecture as it was at the same time as changing over to the 28 nanometre fabrication process would have meant losing out on some of the benefits of this process as the excessive energy consumption would have been a break on complexifying the GPU.
While all the NVIDIA documentation compares the GK104 architecture with that of the GF100/110, this only serves to confuse the picture. In fact, with the GF104/114, introduced with the GeForce GTX 460, NVIDIA offered a variant of its Fermi architecture optimised for gaming yields, whereas the big GPU was designed to offer a compromise that left more space for GPU Computing. It is of course this GF1x4 architecture that the GK104 should be compared to so as to properly understand what developments there have been. You can find our description of the differences between the GF1x4 and the GF100/110 here.
NVIDIA GPUs are based on fundamental blocks known as SMs or Streaming Multiprocessors. These SMs contain a certain number of processing and texturing units, memory cache and logic management. Each group of 4 SMs forms a GPC (Graphics Processing Cluster) and has its own rasterizer, allowing each cluster to process small triangles efficiently. On the GK104, the SM has evolved to become whatís now called an SMX. Hereís a representation of the development from the GF1x4 SM on the left to the GK104 SMX on the right:
As you can see, the SMX dwarfs the SM! The number of main processing units has gone up from 48 to 192 and the number of texturing units from 8 to 16. Is this a radical change? Not really if we look a little closer.
With the SMX NVIDIA has introduced a first energy optimisation : no more dual clocks for processing units (shader clock twice the core clock). Introduced with the G80 and the GeForce 8800 GTXs, running certain units at double the core clock allowed NVIDIA to do a lot more in terms of performance with relatively few processing units. Unfortunately, this approach comes at a cost, with higher energy demands for the units themselves as well as for the distribution of the clock signal.
Moving down to the 28 nanometre process, NVIDIA is less limited by the surface area taken up by the units than by the energy required to run them. It therefore no longer makes any sense to persist with the higher energy solution and on the GK104 NVIDIA has dropped the higher shader core clock and doubled the number of processing units to compensate, including the special function SFUs (but not the units that deal with double precision processing, the rate of which therefore drops to a rate that is 1/24th of single precsion). So then, this explains half of the evolution from the SM to the SMX!
For the other half, we have in fact to see an SMX as two SMs stuck one to another so as to share the same L1 cache and reduce the overall cost of the cache, which doesnít come in all that useful in games because the texturing units have their own dedicated caches. Remember that part of this L1 cache serves as shared memory allowing various threads processed in parallel to communicate during GPU Computing usage.
Fermi GPUs could share their 64 KB between an L1 part of either 16 KB or 48 KB and a shared memory part of either 48 KB or 16 KB.
The GK104 also introduces a 32 KB / 32 KB mode, which allows for more efficient synching with DirectX 11 specifications. The cache bandwidth has also been doubled.
Apart from the cache, the two halves of an SMX are independent of each other. Thus the first two schedulers can only access the first half of the execution units and the two others the second half. Just like with the GF1x4, what we have here is a superscalar architecture as, for any given warp (group of 32 threads) to maximise use of the processing units, it must be possible to process at least 50% of the mathematical instructions as pairs. This isnít therefore strictly speaking a scalar architecture but the compilerís work remains relatively simple.
Each scheduler has its own registers (4096 x 32 bits) and its own group of four texturing units (each with its own little dedicated cache) and can issue two instructions per cycle but must share resources at this level with a second scheduler:
- SIMD0 32-way unit (the ďcoresĒ): 32 FMA FP32 or 4 FMA FP64
- SIMD1 32-way unit (the ďcoresĒ): 32 FMA FP32
- SIMD2 32-way unit (the ďcoresĒ): 32 FMA FP32
- SFU 16-way unit: 16 FP32 special functions or 32 interpolations
- Load/Store 16-way 64-bit unit
Note that this last point isnít very clear. NVIDIA says that the Load/Store capacity of an SMX is the same as a Fermi SM when it comes to 32-bit transactions but doubled for 64-bit. We therefore suppose that the diagram, which is a simplification of a very complex architecture, is partly wrong and that in fact the two halves of an SMX share these resources. 64-bit load/stores however donít represent any additional cost than 32-bit, with NVIDIA stipulating that this first type of access is more often a limiting factor than the second.
We now come to the second development designed to reduce the architectureís energy footprint. Fermi schedulers use scoreboarding to check constantly which resgisters are being used (and therefore possibly being written to) so as to determine which instruction can be issued on which group of data. Kepler still uses scoreboarding, which is important as thereís very high latency on some instructions, but gets rid of it when itís no longer required.
Throughput and latency of mathematical instructions are deterministic and the compiler can therefore predict exact behaviour of the mathematical instructions it issues and no longer needs to call on Fermiís complex harware scheduling to process sequences of instructions within a warp (group of data). This means that Kepler only has recourse to such scheduling for instructions of indeterminate latency (texturing, load, store) as well as to determine which warp to start working on. This approach allows Kepler to reduce the energy consumption required by the processing units.
Note finally that as an SMX is basically two SMs, the pixel and triangle throughputs of an SMX are double those of an SM, namely one triangle (= one vertex fetch) every two cycles and four 32-bit pixels per cycle.
GK104: GF114 x2
GK104: GF114 x2 It was important to introduce the developments put into place at block level of the GK104 processing units first so as to give an idea of the overall organization chosen by NVIDIA for the 1536 processing units! This number represents an enormous jump forward from the GF1x0ís 512 processing units and the GF1x4ís 384 units. We do have to take into account the fact that the shader clock (double the speed) has been shed, which nevertheles gives us double the processing and texturing units of the GF1x4!
The GK104 has 8 SMXs in 4 GPCs but just a 256-bit memory bus. Basically this GPU uses the same memory subsystem as the GF114 but with double the number of execution units and gives the same triangle and pixel throughput as the GF110. This is a relatively good balance but does suggest limitations in terms of GPU computing and a lack of memory bandwidth when it comes to performance with a high level of MSAA.
To make up for this, NVIDIA has doubled the bandwidth of its 512 KB L2 cache (now 512 bytes per cycle) and worked hard on its GDDR5 memory controller. While the memory clock couldnít be increased much on the Fermi generation, with Kepler the GeForce GTX 680 has a memory clocked at 1.5 GHz (6 Gbps). Note that the GK104 has 32 ROPs, corresponding to its pixel rate, in contrast to the Fermi GPUs which were slowed down by the lower pixel throughput of their SMs. Kepler can also send at full speed to the ROPs pixels in FP10 or RGB9E5, pixel formats which enable the compression of HDR data in 32-bit. The ROPs keeps the same blending capabilities however.
The GK104, A2 version.
With the new 28nm fabrication process, the GPU clock is up significantly to 1 GHz (or rather 1006 MHz) and can go a good deal higher given the fact that a turbo boost has been introduced. With 3.5 billion transistors, the GK104 fits onto an area of just 294 mm≤, which is smaller than Tahiti (4.3 billion transistors and 365mm≤) and the GF114 (1.95 billion transistors and 367 mm≤), which is manufactured at 40 nanometres.
NVIDIA hasn't included Direct3D 11.1 support but has included support for PCI Express 3.0 and has entirely revised its display engine. It now supports HDMI 1.4a 3 GHz for 1080p 3D at 60 Hz and 4k resolution and, more importantly, up to 4 video outputs at the same time! The advantage AMD had with Eyefinity has therefore been drastically reduced, especially as the GeForce GTX 680 can drive two DVI outs and an HDMI out directly without having to use a DisplayPort out with a native screen or an active adaptor.
Multi-screen the NVIDIA way: 3 + 1.
Moreover, the GK104 includes NVENC, a fixed H.264 encoder that uses less power to process encoding than the GPU processing units. This engine is similar to the Video Codec Engine that AMD supplies with the Radeon HD 7000s but NVIDIA has announced far superior performance: up to 240 fps at 1080p. It will be interesting to check this in practice along with the quality NVENC gives.
GPU Boost: non-deterministic turbo
GPU BoostThe GeForce GTX 680 is the first graphics card with a turbo mode, allowing the card clock to be increased as long as power consumption remains within a defined envelope.
The NVIDIA approach is however different to what we've seen with turbo on a CPU or AMDís PowerTune. CPUs and the Radeons include a unit that can estimate their consumption using a reading of the usage rates of their different blocks and a table of corresponding currents. This table is fixed for each model and determined conservatively on the basis of a 'bad' sample, say one that has high leakage. So in practice, energy consumption is overestimated to a greater or lesser degree but all the samples of a same model display identical performance.
The GK104 uses a different system. NVIDIA has used on the PCB the small shunt circuits we saw on the GeForce GTX 500s to give a reading right at the 12V power supply sources. This data is then reported to the driver which then reacts based on this real consumption reading of the card.
In contrast to the GeForce GTX 500, with the exception of the GTX 590, this energy consumption monitoring system is on all the time. It is also a bit more reactive here, which means it can adapt the GPU clock every 100ms, both upwards and downwards, by steps of 13 MHz at the same time as adapting voltage. If energy consumption exceeds a certain threshold, the clock will be reduced progressively but where consumption is low, the GPU clock can be increased.
The approach is however non-deterministic. Depending on the quality of the GPU sample, its temperature and its power stage yield, the energy consumption will vary and these factors will impact on the GPU clock.
And here we come to the thing that disturbs us: NVIDIA has refused to quantify this variation and claims not to know what it is. Moreover, NVIDIA is refusing to give the true clock limit for GPU Boost and is simply saying that all GeForce GTX 680s will at least be able to manage a GPU Boost clock of 1058 MHz in some cases. At the same time it's saying that its engineers have often observed higher clocks in the lab. In other words, NVIDIAís engineers donít really work to fix the specs but just sit back and observe the magic of clocks going up all on their own!
According to our observations on our sample, the maximum GPU Boost clock is 1110 MHz and the base clock 1006 MHz, with eight 13 MHz jumps in between. The final 13 MHz jump up to 1110 MHz is however harder to attain when the GPU has heated up. On most games tested our card managed up to 1097 MHz.
Note that GPU Boost uses a lower value than the TDP as the power target. While the GeForce GTX 680 TDP is 195W, GPU Boost uses a target of 170W for clock increases. However the clock isnít reduced until the TDP has been reached.
NVIDIA has worked with the author of Rivatuner so as to be able to illustrate the functioning of GPU Boost and allow modifications to the power target as well as overclocking. EVGAís Precision X (others will follow) is thus already fully functional.
The GPU Boost power target is expressed as energy consumption of 100%, with TDP corresponding to 115%. This can be varied between 71 and 132%, corresponding to a range of 121 to 224W for the GTX 680. Increasing the target allows you to maximise GPU Boost usage and is also required for overclocking.
Note that you can't turn GPU Boost off nor change how it works. If you increase the GPU base clock from 1006 to 1106 MHz, GPU Boost will still offer the same eight 13 MHz steps.
Specifications, the reference GeForce GTX 680
In comparison to the GeForce GTX 580, the GeForce GTX 680 has twice the processing power and more than twice the texturing power. However it has the same memory bandwidth. On paper at least, the Radeon HD 7970 has the advantage.
For this test, NVIDIA supplied us with a reference GeForce GTX 680:
The reference GeForce GTX 680
The GeForce GTX 680 is 25.5cm long and is thus more compact than the Radeon HD 7970 or the GeForce GTX 580. The radial fan cooling system is similar to that used by NVIDIA on its previous high end graphics cards.
With a TDP of 195W however, it was possible to simplify the cooling block. The vapour chamber has been replaced by a copper base from which three flattened heatpipes lead through the aluminium radiator. A metallic plate supports the cooling system, makes sure the PCB is secure and is in contact with the memory modules and the sensitive power stage components. Thereís also a plastic casing. Note that while the pretty standard design is nice enough, the manufacturing quality is rather poor and the plastic creaks when youíre handling it. Itís more comparable to the sort of finish seen on the GeForce GTX 560 Ti than the Radeon HD 7970 or the GeForce GTX 580.
There are 5 phases on the PCB to power the GK104, with two for the Hynix R0C GDDR5 certified at 1.5 Ghz. Only 4 of these GPU phases are however needed for the GeForce GTX 680 and the fifth is unpopulated. The GeForce GTX 680 has two 6-pin power supply connectors, corresponding to a maximum energy consumption of 225W according to PCI Express specifications. Note that NVIDIA has opted for an original dual connector to gain some space but the PCB has been designed for any type of connector, including 8+6 pin.
The connectivity has been revised and the card has two DVI Dual LInk connectors, one HDMI 1.4a 3 GHz connector and a DisplayPort 1.2 connector.
Noise levels and GPU temperature
NoiseTo observe the noise levels produced by the various solutions, we put the cards in a Cooler Master RC-690 II Advanced casing and measured noise at idle and in load. We used an SSD and all the fans in the casing, as well as the CPU fan, were turned off for the reading. The sonometer was placed 60 cm from the closed casing and ambient noise was measured at +/- 21 dBA. Note that for all the noise and heat readings, we used the real reference design of the Radeon HD 7950, rather than the press card supplied by AMD.
The GeForce GTX 680 is quite quiet at idle and less noisy in load than the GeForce GTX 580 or the Radeon HD 7970. Note however that the fan speed varies slightly but constantly, giving variable noise levels from 43.5 to 45.2 dBA.
TemperaturesStill in the same casing, we took a reading of the GPU temperature with the internal sensor:
The GeForce GTX 680 cooling system is pretty effective.
Readings and infrared thermography
Infrared thermographyFor this test we used the new protocol described here.
First of all, here's a summary of all the readings:
At idle the GeForce GTX 680 enjoys lower energy consumption than the GeForce GTX 580.
Overall the internal temperatures with the GTX 680 are lower in load than the GTX 580. Note that in our load test, in 3DMark 11, the clock of the GK104 varies between 1006 and 1097 MHz.
Here finally is what the thermal imaging shows:
These photos confirm that the GeForce GTX 680 is well cooled, even if its power stage heats up a bit more than that on the Radeon HD 7970.
Energy consumption and performance/watt
Energy consumptionWe used the test protocol that allows us to measure the energy consumption of the graphics card alone. We took these readings at idle on the Windows 7 desktop as well as with the screen in standby so as to observe the impact of ZeroCore Power. In load we opted for the readings in Anno 2070, at 1080p with all options pushed to maximum, as well as those in Battlefield 3, at 1080p in High mode:
While the GeForce GTX 680 drastically reduces energy consumption at idle in comparison to the GeForce GTX 580, with a slightly lower reading than the GeForce GTX 560 Ti, it has nothing like ZeroCore Power to shut the GPU down almost entirely in screen standby.
In load energy consumption is generally lower than the Radeon HD 7970 but this isnít always the case. In Battlefield 3 for example, the GTX 680 consumes slightly more.
We have shown the energy consumption readings graphically, with fps per 100W to make the data more legible:
[ Anno 2070 1080p Max ] [ Battlefield 3 1080p High ]
Thanks to the 28nm fabrication process and the revised GK104 architecture, the energy yield has indeed improved. It is slightly better than that for the Radeon HD 7970 in Anno 2070 and 15% up on Battlefield 3 but the Radeon HD 7870 is still the most efficient. Note however that each game represents a particular case.
Theoretical performance: pixels
Texturing performanceWe measured performance during accesses to textures of different formats in bilinear filtering. for standard 32-bit (4xINT8), 64-bit ďHDRĒ (4x FP16) and 128-bit (4x FP32) and 32-bit RGB9E5, an HDR format introduced with DirectX 10 which enables the storing of 32-bit HDR textures with a few compromises.
The GeForce GTXs can filter FP16 textures at full speed in contrast to the Radeons which up until now made up for this with such superior filtering power that even though they had to filter FP16 textures at half speed, they were able to post similar speeds to the GeForces. This is no longer the case with the GeForce GTX 680, which has a considerable advantage on this point.
Nevertheless, while it benefits from GPU Boost in our test to run at 1110 MHz and shows a theoretical speed of 142 Gtexels/s, it struggles to reach this in practice, running at 25% less.
Note that we had to increase the energy consumption limit of the Radeon HD 6900s as well as the Radeon HD 7700s and 7800s to a maximum otherwise the clocks are reduced in this test. By default the Radeons therefore seem incapable of fully benefitting from their texturing power! Note that this is no longer the case for the Radeon HD 7900s. We highlighted the proportion of the performance that can only be obtained by modifying PowerTune limit.
FillrateWe measured the fillrate without and then with blending, and this with different data formats:
In terms of fillrate, the GeForce GTX 680 and the GK104 GPU are finally able to process FP10/11 and RGB9E5 formats at full speed, although blending of these formats is still at half speed. While both the GeForces and the Radeon can process the single channel FP32 format at full speed without blending, only the Radeons maintain this speed with blending. Moreover, theyíre significantly faster with FP32 quad channel (HDR 128 bits)
Although the Radeon 7800s have the same number of ROPs as the Radeon HD 7900s, their lower memory bandwidth means they canít use them to maximize throughput with blending or with FP16 and FP32 without blending.
Theoretical performance: geometry
Triangle throughputGiven the architectural differences between the various GPUs in terms of geometry processing, we obviously wanted to take a closer look at the subject. First of all we looked at triangle throughput in two different situations: when all triangles are drawn and when all the triangles are removed with back face culling (because they arenít facing the camera):
The GeForce GTX 680 doesnít do any better than the Radeon HD 7970 when drawing triangles, though its performance has probably been deliberately reduced Ė NVIDIA was already cutting back the GTX 500s to differentiate the Quadros from the GeForces.
When the triangles can be removed from the rendering, the GeForce GTX 680 takes full advantage from its capacity to process 4 triangles per cycle to take the lead.
Next we carried out a similar test using tessellation:
With the GeForce GTX 680, NVIDIA has reaffirmed its superiority when it comes to processing a lot of small triangles generated by a high level of tessellation. The Radeon HD 7900s are on a par with the Radeon HD 7800s, which have the same number of fixed units dedicated to the task.
The architecture of the Radeons means that they can be overloaded by the quantity of data generated, which then drastically reduces their speed. Doubling the size of the buffer dedicated to the GPU tessellation unit in the Radeon HD 6800s meant they gave significantly higher performance than the Radeon HD 5000s. AMD has continued down this line with the Radeon HD 7000s.
For reasons unknown, the GeForce GTX 570 gives better performance here than the GeForce GTX 580, which may also suffer from overload, though it is possible that this is linked to a driver profile.
Drivers, the test
Developments of GeForce driversNVIDIA has taken the opportunity of the GeForce GTX 680 launch to introduce branch 300 of its drivers. In addition to numerous slight performance optimisations, these drivers also introduce a few interesting new innovations.
First of all, they have brought in adaptive vertical synchronisation which turns v-synch off when performance drops. Traditionally, on a 60 Hz screen with v-sync on, where the GPU canít maintain 60 fps, it finds itself limited to 30 fps, then 20 fps, 15 fps and so on. These major dips in performance are of course undesirable and mean that many gamers prefer to do without vertical synchronisation and put up with the tearing artifacts you get when v-sync is off. The new NVIDIA approach, that we've been expecting for a long time from one or other GPU manufacturer, finally resolves this problem. NVIDIA moreover allows you to specify whether v-sync should automatically be switched off under the refresh frequency or at half of this value.
Still on v-sync, it's now possible to turn it off in 3D Vision mode.
Finally, NVIDIA has included a new antialiasing option that allows you to force an FXAA type of antialiasing mode, as AMD has done with the similar MLAA. Note that NVIDIA has also just introduced TXAA, a new type of antialiasing which isnít however used in the drivers and which has nothing to do with Kepler. This is a development of FXAA being proposed to developers with a view to its inclusion in an engine that already supports MSAA automatically. It optimises the MSAA (2x) and FXAA mix, which some games already do in an approximative way by applying both filters rather crudely one after the other without optimising their complementarity. While this algorithm works on all GPUs, NVIDIA hasnít yet decided if its usage contract will authorise use on Radeons or if games developers will have to block this option.
The testFor this test we used the protocol introduced for our report on the Radeon HD 7970 which includes new games: Anno 2070, Batman Arkham City, Battlefield 3, F1 2011 and Total War Shogun 2. We added Alan Wake.
We have decided no longer to use the level of MSAA (4x and 8x) as the main criteria for segmenting our results. Many games with deferred rendering offer other forms of antialiasing, the most common being FXAA, developed by NVIDIA and it therefore no longer makes sense to organise an index around a certain level of antialiasing, which in the past allowed us to judge a card according to its effectiveness with MSAA, which can vary according to implementation.
At 1920x1080, we carried out the tests with two different quality levels: extreme and very high, which automatically includes a minimum of antialiasing (either MSAA 4x or FXAA/MLAA/AAA). We also carried out the tests with this second quality level at 2560x1600 as well as with surround resolution, at 5760x1080.
We no longer show decimals in game performance results so as to make the graph more legible. We nevertheless note these values and use them when calculating the index. If youíre observant youíll notice that the size of the bars also reflects this.
The Radeons were tested with the beta 8.95.5-120224a drivers. Although NVIDIA recommends you to keep the 295.x drivers for the GeForce GTX 500 tests, all the GeForces have been tested with the 300.99 beta drivers bringing performance gains in most games and over 5% in some cases on all cards.
Test configurationIntel Core i7 980X (HT and Turbo off)
Asus Rampage III Extreme
6 GB DDR3 1333 Corsair
Windows 7 64 bits
GeForce beta 300.99 drivers
Catalyst beta 8.95.5-120224a
Benchmark: Alan Wake
Alan Wake is a pretty well executed title ported from console and and based on DirectX 9.
We used the gameís High quality levels and added a maximum quality level with 8x MSAA. We carried out a well defined movement and measured performance with Fraps.
The Radeon HD 7000s do particularly well in this game in which they easily outdo the GeForces. The GTX 680 suffers particularly with MSAA 8x, falling behind the Radeon HD 7870.
The Radeons maintain their domination at very high resolutions. Note that as things stand NVIDIA doesnít yet have an SLI profile.
Benchmark: Anno 2070
Anno 2070 uses a development of the Anno 1404 engine which includes DirectX 11 support.
We used the very high quality mode on offer in the game and then, at 1920x1080, we pushed anistropic filtering and post processing to a max to make them very resource hungry. We carried out a movement on a map and measured performance with fraps.
While the GeForce GTX 500s were down in this game, the GeForce GTX 680 showed significant progress but didn't manage to outdo the Radeon HD 7970 when maximum quality was applied.
At very high resolutions we recorded the GeForce GTX 680 and the Radeon HD 7970 at the same level.
Benchmark: Batman Arkham City
Batman Arkham City
Batman Arkham City was developed with a recent version of Unreal Engine 3 which supports DirectX 11. Although this mode suffered a major bug in the original version of the game, a patch (1.1) has corrected this. We use the game benchmark.
All the options were pushed to a maximum, including tessellation which was pushed to extreme on part of the scenes tested. We measured performance in Extreme mode (which includes the additional DirectX 11 effects) with FXAA High (high resolution), MSAA 4x and MSAA 8x.
The Radeons suffer with MSAA 8x, a mode that probably saturates the L2 cache and memory controllers. AMD told us however that a problem had been detected and would be corrected in future drivers. Weíve had no news with respect to a corrected CrossFire profile however Ė performance is currently poor, with reduced fluidity.
At high resolutions with FXAA, the Radeon HD 7970 makes up some ground.
Benchmark: Battlefield 3
Battlefield 3 runs on Frosbite 2, probably the most advanced graphics engine currently on the market. A deferred rendering engine, it supports tessellation and calculates lighting via a compute shader.
We tested High and Normal modes and measured performance with Fraps, on a well-defined route. Note that a patch designed to improve performance on the Radeon HD 7000s came out on the 14th February. Naturally we installed it and noted a gain of between 1 and 2%.
The GeForce GTX 680 is particularly efficient in Battlefield 3, with an advantage of almost 20% over the Radeon HD 7970 in high quality mode. In Ultra, which includes MSAA 4x, it has just a 12% lead.
At very high resolutions, the GTX 680ís lead is reduced. Note that rendering in surround is very jumpy with multi-GPU solutions. In spite of giving 50 fps in surround, the Radeon HD 7870s give something more akin to a 25 fps sensation and the game is unplayable.
Although only in DirectX 9 mode, the rendering is pretty nice, based on version 3.5 of Unreal Engine.
All the graphics options were pushed to a max (high) and we measured performance with Fraps, with MSAA 4x and then 8x.
The GeForce GTX 680 suffers under the demands of MSAA 8x.
This is also the case at very high resolutions where the Radeon HD 7970 has the advantage.
Benchmark: Civilization V
Pretty successful visually, Civilization V uses DirectX 11 to improve quality and optimise performance in the rendering of terrains thanks to tessellation and to implement a special compression of textures thanks to the compute shaders, a compression which allows it to keep the scenes of all the leaders in the memory. This second usage of DirectX 11 doesnít concern us here however as we used the benchmark included on a game card. We zoom in slightly so as to reduce the CPU limitation which has a strong impact in this game.
All settings were pushed to a max and we measured performance with shadows and reflections. The latest patch was installed.
While the Radeon HD 7000s have corrected the performance issues they were having in this game, the GeForce GTX 680 does even better, benefitting among other things from the new series 300 drivers which give a gain of over 5% over the GTX 500s here.
Although the game itself supports surround gaming, our test scene doesn't.
Benchmark: Crysis 2
Crysis 2 uses a development of the Crysis Warhead engine optimised for efficiency but adds DirectX 11 support via a patch and this can be quite demanding. This is the case, for example, with tessellation, implemented abusively in collaboration with NVIDIA with the aim of causing Radeon performance to plummet. We have already exposed this issue here.
We measured performance with Fraps on version 1.9 of the game.
The GeForce GTX 680 lead is cut in Ultra mode. Note that the Radeons use a different CrossFire profile according to whether the drivers detect the use of Ultra or Extreme modes. The Ultra mode profile apparently leads to a slightly lower CPU limitation which explains why the Radeon HD 7870s in CFX are faster here than in Extreme mode.
At very high resolutions the GeForce GTX 680 and the Radeon HD 7870 were on a par. Note that in surround, the image isn't fluid with multi-GPU solutions which give a sensation of having half the frame rate.
Benchmark: F1 2011
The latest Codemaster title, F1 2011 uses a slight development of the F1 2010 and DiRT 3 engine, which retains DirectX 11 support.
We pushed all the graphics options to a max and we used the gameís own test tool on the Spa-Rancorchamps circuit with a single F1.
The GeForce GTX 680 does particularly well in this game, at least when MSAA 8x is on. Note the high CPU limitation in CrossFire X in this game.
At very high resolutions the GTX 680 retains the advantage but a smaller one.
Benchmark: Metro 2033
Still one of the most demanding titles, Metro 2033 forces all recent graphics cards to their knees. It supports GPU PhysX but only for the generation of particles during impacts, a rather discreet effect that we therefore didnít activate during the tests. In DirectX 11 mode, performance is identical to DirectX 10 mode but with two additional options: tessellation for characters and a very advanced, very demanding depth of field feature.
We tested it in DirectX 11, at maximum quality (including DoF and MSAA 4x), very high quality as well as with tessellation on.
No mono-GPU card allows you to play Metro 2033 comfortably at maximum quality. Moreover the GeForce GTX 680 doesn't do all that well in this mode as it is very demanding when it comes to memory bandwidth.
At very high resolutions the current top two mono-GPUs were on a par. Once again there are micro-jumps with multi-GPU solutions.
Benchmark: Total War Shogun 2
Total War Shogun 2
Total War Shogun 2 has a DirectX 11 patch, developed in collaboration with AMD. Among other things, it gives tessellation support and a higher quality depth of field effect.
We tested it in DirectX 11 mode, at max quality, with MSAA 4x and MLAA.
For once, the GeForce GTX 680 does best with MSAA 4x, with the Radeon HD 7000s suffering a great deal when this filter is on in most differed rendering engines.
At very high resolutions with MLAA, the Radeon HD 7970 moves in front of the GeForce GTX 680.
Performance recapAlthough individual game results are obviously worth looking at when you want to gauge performance in a specific game, we have also calculated a performance index based on all tests with the same weight for each game. We set an index of 100 to the GeForce GTX 580:
On average, the GeForce GTX 680 outperforms the Radeon HD 7970 by 5.6% across all our tests which offer a mix of different rendering techniques and types of antialiasing. Thereís certainly been a very comfortable (almost 30%) gain on the GeForce GTX 580, but this is down on what NVIDIA has got us used to with high-end cards.
It's interesting to note that the GeForce GTX 680, which represents a duo of overclocked GeForce GTX 560 Tis but with limited memory bandwidth, only slightly leads them when in SLI. In comparison to just one of these cards, the advantage is close to 80%.
At very high resolutions, the GeForce GTX 680s perform relatively poorly and are on a par with the Radeon HD 7970. The gain over the GeForce GTX 580 is however up to 35% here at 2560x1600.
While the Radeon HD 7870 CrossFire solution is the most efficient at 5760x1080, itís important to note that in most games, in contrast to what the performance figures indicate, the fluidity isnít up to scratch.
We noted in all these tests that the Radeons tend to suffer more than the GeForces when the deferred rendering engines use MSAA. Itís a complex task to use this type of antialiasing in a deferred rendering engine. Thereís an example in our report: Understanding 3D rendering step by step with 3DMark 11. Itís possible that they suffer from a technical limitation here or that AMD hasnít put as much of an effort into optimisation, preferring to highlight the antialiasing carried out during post processing, such as with FXAA or MLAA which is simpler to support.
GPU Boost performance and overclocking
Hold the mouse over the graph to show results in fps.
GPU Boost performance and overclockingWe wanted to observe the gains given in practice by GPU Boost, which we managed to neutralise.
Moreover we managed to overclock the GeForce GTX 680 by increasing the GPU clock by 100 MHz (it therefore varies between 1106 MHz and 1210 MHz) and the memory clock by 300 MHz, which was therefore increased to 1800 MHz. The energy consumption limit was then pushed to its maximum, +32% or 224W.
Note that GPU Boost greatly complicates GPU overclocking making it difficult to test if the maximum clock imposed in certain cases is really stable. With our sample +125 MHz caused crashes in certain cases but seemed stable in others where GPU Boost wasn't pushing the GPU as high. Moreover, it imposes a voltage/clock as predefined by NVIDIA and this limits room for manoeuvre. We'll have to wait until a more evolved application allows us to force a higher voltage. In the meantime, donít expect to be able to overclock the GPU too high.
We have also included the results of a Radeon HD 7970 at base clocks and overclocked to 1075/1650 MHz, clocks that can be attained very easily on this model.
We obtained these results at 1920x1080 at extreme and very high quality settings:
Hold the mouse over the graph to show results in fps.
Itís interesting to see that without GPU Boost, the GeForce GTX 680 would only have been on a par with the Radeon HD 7970.
Overclocking the GeForce GTX 680, and more particularly its memory, makes most of a difference in situations where the GTX 680 was down on its competitor. The average gain is 13%, with a peak of 18% in Alan Wake. The Radeon HD 7970 has a 15% lead with this relatively conservative overclocking, allowing it to make up some of the ground on the GeForce GTX 680.
Note that this is an average of gains and not a performance index that uses a weighting with respect to the highest performance mono-GPU card in each game and gives a result that differs by a percentage point.
ConclusionWith this first Kepler generation GPU, the GK104, NVIDIA's main objective was to revisit as far as possible the overly weak energy yield on the Fermi generation, which was probably becoming increasingly difficult to manage. This objective has successfully been attained, and represents a hugely important development allowing NVIDIA to market more efficient products.
In the course of developing this GPU, which was originally designed for the slightly lower performance segment, it became clear that it might be able to compete with the AMD high end. NVIDIA made sure that this was the case by preparing a GeForce GTX 680 with high clocks and reworked drivers as well as the in-house turbo technology, GPU Boost, which gives a little additional performance and just enough to outperform the Radeon HD 7970 in our set of games.
While the GeForce GTX 680 retakes the crown as the highest performance mono-GPU graphics card currently on the market, it only has a slight advantage and its performance is very variable from one situation to another. The card suffers particularly with 8x antialiasing, which puts a lot of demands on memory bandwidth.
On top of this, NVIDIA has categorically refused to go into any detail in terms of the specifics of how GPU Boost will work on the cards that we'll find in stores. It has to be said that the technology, unlike turbo on CPUs, is not deterministic and that any two samples of the same card will therefore perform differently. Our tests have shown that GPU Boost gave a performance gain of between 4 and 5% on a sample that we imagine was carefully chosen by NVIDIA and tested under ideal conditions. Therefore we can only conclude that this 4 or 5% spread is the range of performance that GeForce GTX 680s will give once in your system and that the performance of these cards could therefore be very close to that on the Radeon HD 7970.
In terms of overclocking, the GK104 has less extra potential than the Radeon HD 7900s as GPU Boost already uses much of this potential. Memory overclocking can however go much further and can give a real fillip to the GeForce GTX 680s which are somewhat lacking in memory bandwidth, providing a nice gain in performance particularly in situations where these cards donít do so well. In order to outperform these boosted GTX 680s, the Radeon HD 7970 will have to be massively overclocked, with increased GPU voltage and higher noise levels.
With the same pricing of around Ä500, it's not fully obvious to separate these two graphics cards, especially as NVIDIAís addition of support for four screens removes AMDís Eyefinity advantage from the balance. When allís said and done, the Radeon HD 7970 retains an advantage in terms of standby energy consumption and this may well make the difference for some usages. Its architecture also looks towards the future when it comes to GPU computing and full DirectX 11.1 support. Nevertheless, our preference tends to be for the GeForce GTX 680, which enjoys a slightly better energy yield, slightly lower noise levels, gives access to the 3D Vision ecosystem, the most common for 3D stereo, has drivers offering innovative features such as adaptive v-sync and in general provides optimum support for new games more rapidly. In general but not always as the Alan Wake results for GeForces show.
It also has to be said that neither the GTX 680 nor the HD 7970 offer particularly good bang for your buck as AMD and NVIDIA have settled for doing the strict minimum in terms of pricing efforts, helped as they are by the low availability of 28nm solutions, which is currently stopping competition from kicking in fully. How much of an advantage the two solutions give is also debatable. For playing at 1920x1080 for example, the Radeon HD 7870 will very often do fine if youíre willing to accept a few compromises in terms of your graphics settings in the most demanding games. On the other hand, a GTX 680/HD 7970 mono-GPU solution won't give enough power for gaming in surround at high settings, in which case you're forced to go for a multi-GPU systemÖ though you will of course need to make sure that your multi-GPU doesn't suffer from any micro-stuttering!
Copyright © 1997-2013 BeHardware. All rights reserved.