Review: Nvidia GeForce GTX 660 Ti, Asus DirectCU II TOP, EVGA SuperClocked and AMD Radeon HD 7950 v2 - BeHardware
>> Graphics cards
Written by Damien Triolet
Published on September 7, 2012
Nvidia is continuing to roll out its Kepler architecture down the range with the arrival of the GeForce GTX 660 Ti. Very close to the GeForce GTX 670 in terms of spec, it could be well worth a look for gamers: we have tested the reference version as well as the customised Asus and EVGA models. AMD however intends to put a dampener on Nvidia’s party with updated specs for the Radeon HD 7950…
GK104 for everyoneIn spite of what might have been expected, the GeForce GTX 660 Ti isn’t based on a new simplified version of the current high-end GPU. It uses the same GPU as the GeForce GTX 690, 680 and 670: the GK104. You can find all the details on this GPU in the review of the GeForce GTX 680. Ironically, with the GeForce GTX 660 Ti, this GPU is in fact probably now where it was intended to be in terms of its position in the range as it was not really designed as a traditional high-end GPU. With better performance than planned, less fierce competition than had been expected and the fact that the big GPU had been delayed or cancelled, Nvidia was however able to introduce the GK104 successfully for the high-end or even very high-end segment.
Little by little it has now been rolled out in the ‘Performance’ segment, which is the segment between the mid and high end and in which you generally find solutions that appeal to gamers looking for maximum performance at a reasonable price.
In designing the GeForce GTX 660 Ti, Nvidia has used a partially cut-down version of the GK104 that is very similar to the version used on the GeForce GTX 670: the number of processing units and the clocks are identical but the memory bus is at 192 rather than 256 bits, which allows Nvidia to drop its price to €300 (rather than €380) without affecting performance too much.
In spite of the 192-bit bus, Nvidia has retained a video memory of 2 GB for the main variant of the GeForce GTX 660 Ti, taking advantage of the fact that its GPUs support an asymmetric memory. How can this be? Two 64-bit memory controllers each address 512 MB while the third addresses 1 GB of memory. In other words this means that the GPU can access 1.5 GB at full speed on 192 bits and keep 512 MB in reserve with a slower 64-bit access. A 3 GB version with a symmetric memory configuration is also planned and will be offered by some partners.
The updated Radeon HD 7950AMD apparently rushed to update the reference spec of the Radeon HD 7950, no doubt to respond to the GeForce GTX 660 Ti. The changes are similar to those introduced for the Radeon HD 7970 GHz Edition: the clocks have been revised upwards.
Here AMD has used PowerTune with Boost, a more precise update of its energy consumption technology which can now vary the voltage, which allows them to validate a higher GPU clock. While at the start the Radeon HD 7950 was clocked at 800 MHz, the new version is clocked at 925 MHz, with the higher voltage kicking in as of 850 MHz.
AMD says that all Radeon HD 7950 GPUs are compatible with this upgrade which can be implemented simply by updating the bios. Indeed we used a new bios supplied by AMD to test the solution here. This bios has however only been designed for the initial press samples of the Radeon HD 7950, the design of which is different to that of most cards available in stores. As this press model has a sturdier cooling system and power stage, it can handle this version’s higher TDP no problem: the TDP is up from 200 to 225W.
While in practice all the other Radeon HD 7950s should support the upgrade without any problems, with however a question mark over power stages which aren't cooled by a small radiator, it is up to each manufacturer to validate its own model (or not) and offer a new bios, either with an update for all cards or only for those manufactured as of now. Note that given how all AMD’s partners seem to have ignored the update of the specs of the Radeon HD 7700, we can’t be sure that they will all provide new bios’ for current HD 7950s, especially as it could complicate things for their overclocked models, on which it is harder to put an update into place.
PowerTune vs GPU Boost
PowerTune vs GPU BoostAMD and Nvidia have both introduced energy consumption monitoring systems for their graphics cards, which allows them to fix more aggressive clocks at the same time as still being able to guarantee the reliability of their products. Without such systems, they would have to settle for lower clocks or risk reliability issues as heavy rendering tasks could then take their graphics cards and GPUs beyond the energy consumption levels they had been designed for.
Once implemented, this energy consumption monitoring can also be used for features, such as for example introducing turbo or energy economy modes.
AMD was first to develop such a technology, with PowerTune, introduced on the Radeon HD 6900s and included on all the Radeon HD 7000s. Like the turbo technologies used for CPUs, PowerTune estimates the energy consumption of the chip using lots of internal sensors for the different blocks that make up the GPU. It may seem strange at first to estimate energy consumption instead of taking a reading, but this means that AMD can determine behaviour in terms of performance. A relatively complex formula then transposes these usage levels into power consumed taking account of the parameters that correspond to the least favourable case: a GPU with significant current leakage which is in a very hot environment.
AMD has recently made two refinements to the technology. The first consists in estimating the GPU temperature by giving a temporal value to the estimated consumption. This estimated temperature can therefore replace the constant which represents the worst of cases and gives more flexibility in standard situations. To simplify things, the idea is to estimate the energy consumption more precisely so as to avoid being too conservative. The real temperature is still taken and used both as a higher level of protection and to regulate the fan.
This development to PowerTune came on stream as of the Catalyst 12.7 betas and will be rolled out across the Radeon HD 7900 range. In practice this won't make any difference in games, outside of major overclocking, but it will allow these cards to retain a higher clock in stress tests. Note that in the future AMD could exploit this new capability to authorise the GPU to exceed its TDP for a few seconds, in the same way that Intel does with its latest CPUs, but for 3D rendering in real time the load is relatively constant over time and the feature will therefore be of little interest.
The second innovation is the introduction of a turbo feature called Boost, a mode it had become difficult to avoid. In concrete terms Boost represents the capacity of PowerTune to modify the GPU voltage, in addition to its clock. This innovation is reserved for the Radeon HD 7970 GHz Edition and the HD 7950 v2 as the bios must contain a table of clock/voltage pairs, but more importantly because there’s a more complex GPU validation process. The Radeon HD 7970 GHz has thus been validated up to 1000 MHz with a fixed standard voltage but also up to 1050 MHz with a voltage that climbs progressively (850 MHz and 925 MHz in the case of the Radeon HD 7950 v2). PowerTune currently supports up to 256 steps (P-states) with a granularity of 4 MHz.
In practice, as the TDP of the Radeon HD 7970 GHz Edition remains oversized in comparison to energy consumption in video games (with some rare exceptions), Boost can be seen as a way of safely validating the GPU at a higher clock, which will be applied constantly across almost all the games, unlike a turbo that is likely to bring very variable gains.
The situation is different however for the Radeon HD 7950 v2, not only because its thermal envelope doesn’t have as much of a margin but above all because AMD has planned to use up samples of the Tahiti GPU that have very high current leakage on this model. In other words, AMD needs to remain very conservative with respect to the parameters it uses for estimating energy consumption. To compensate for this a bit, the TDP has been increased from 200 to 225W, but this is still insufficient to allow Boost to kick in consistently as it does with the Radeon HD 7970 GHz Edition. The Radeon HD 7950 v2 thus settles for a GPU clock of 850 MHz most of the time, this being the maximum clock allowed without any increase to the GPU voltage.
Note that as Boost increases the voltage, energy consumption increases exponentially, which doesn’t make this a good solution in terms of improving energy yield. This is also the case with Nvidia’s GPU Boost, which doesn’t aim to improve yield per Watt but to make the most of each Watt available to offer slightly higher performance.
However this is the only thing the two technologies have in common: PowerTune is entirely deterministic: under identical conditions, all the Radeon HD 7900s behave in the same way. This isn’t the case for the GeForces.
Thanks notably to the fact that energy consumption is well under control, Nvidia has managed to introduce a turbo feature for its GPUs called GPU Boost. GPU Boost has also been designed to maximise the GPU clock to fully benefit from the available thermal envelope. We aren’t fully convinced by Nvidia’s approach as GPU Boost is non-deterministic: in contrast to CPUs, it is based on real energy consumption which varies between each GPU sample according to manufacturing quality and the current leakage affecting it.
Why go for such an approach? Nvidia was probably caught napping when AMD, benefiting from the experience of its CPU team, introduced PowerTune with the Radeon HD 6900s and hasn’t yet been able to introduce a similar technology. It has to be introduced at the heart of the architecture and we imagine that Kepler was already too far along in its development for this to be done. Nvidia therefore responded with external monitoring with the Geforce GTX 500s. It’s this same system which is still used on the GTX 600s and we’ll have to wait for the next generation for a more evolved technology to be implemented.
Moreover the current Nvidia technology has the disadvantage of being relatively slow (100ms vs a few ms) but however has the advantage of allowing each sample to fully benefit from the whole TDP while for CPUs and the Radeons energy consumption is overestimated and not all samples are therefore able to enjoy all their available TDP.
What’s more, Nvidia doesn’t validate all same derivative samples of this GPU (the GK104-400 for the GTX 680, the GK104-325 for the GTX 670 and the GK104-300 for the GTX 660 Ti) at the same maximum turbo clock. Officially, Nvidia settles for giving a guaranteed maximum clock but allows the GPU to exceed this if it qualifies to do so. In other words, GPU Boost also represents an automatic GPU overclocking. The problem is that the press rarely receive medium level samples and as a result the performance levels we give are somewhat higher than what you may find with samples from stores.
Nvidia justifies itself by explaining that it aims to give maximum performance to each sample and says that while variation in the maximum GPU Boost clock can be significant, the variation in average GPU Boost clock observed is lower, the reason being that the energy consumption limit stops the GPU from increasing to a very high level in the more demanding games and also that the temperature limits it.
What Nvidia fails to say is that we also slightly overevaluate performance levels as our testing is carried out under ideal conditions: brief test on a workbench. Although it might make our work more fun, unfortunately we can’t play for an hour to heat up the GPU before taking each performance reading! To recap, we observed the difference in performance between two GeForce GTX 680s. Without looking for worst case performance we observed a theoretical difference of 2% and 1.5% in practice. This isn’t enormous but is a nuisance where the difference with the competition is so tight. With the margin for manoeuvre given to GPU Boost exploding with the GeForce GTX 670 and GTX 660 Ti (15%), it really does become problematic as far as we can see.
What's the solution? Ideally Nvidia would allow testers to limit cards at the level of the least favourable case with the GPU Boost clock limited to the officially guaranteed value. As Nvidia doesn’t offer such a solution to limit GPU Boost, we therefore chose to simulate such a sample ourselves by juggling with overclocking parameters. This enables us to give you a very precise measure (in spite of the ‘DIY’ aspect of the solution) of the guaranteed performance for a basic sample as well as the level of performance you can get with a more favourable sample. What’s the situation in stores with respect to the sort of sample you can expect? Unfortunately we don’t know as Nvidia and its partners have categorically refused to go into the matter.
Specifications, the reference GeForce GTX 660 Ti
The GeForce GTX 660 Ti is very similar to the GeForce GTX 670, with just a bandwidth that is lower by 25% and a fillrate down by 14%. Note that while the number of ROPs has dropped from 32 to 24 (down 25%), the GeForce GTX 670 can in fact only throughput 28 pixels per cycle and not 32, as there’s a small bottleneck upstream.
As Nvidia doesn’t validate all its GPUs at the same clock, their maximum power will vary within a range of 16% and their power in practice will vary as well depending on the GPU load and its temperature.
Meanwhile, the Radeon HD 7950 v2 enjoys a processing power gain over the original Radeon HD 7950 of between 6 and 16% – the gain varies according to GPU load and will be identical on all the cards.
The reference GeForce GTX 660 Ti
The reference GeForce GTX 660 Ti is almost identical to the GTX 670. It is relatively compact at 24cm in length and the PCB is even shorter at 17.2 cm! This gives these GeForces a very original design with a plastic extension behind the PCB to make the card long enough to have a radial, or blower, fan.
With a TDP of 170W for the GTX 670 and 150W for the GTX 660 Ti (and GPU Boost limit targets of 140 and 134W), Nvidia has been able to go for a very basic cooling block: an aluminium radiator equipped with a big copper inset at its base. There’s a second radiator for the sensitive power stage components.
A standard shell covers the card. Overall the finish isn’t that high in quality, well below what we expect for a high end card on sale in this price range. Some samples also seem to suffer from a mechanical noise coming from the fan or its holder, a problem we also found on our GeForce GTX 670 sample.
Nvidia has designed its PCB with four phases to supply current to the GK104 as well as two for the Hynix R0C GDDR5 certified at 1.5 GHz. The GeForce GTX 660 Ti has two 6-pin power supply connectors, corresponding to a maximum energy consumption of 225W according to PCI Express specifications. In practice energy consumption is a good deal lower than this.
The PCB differs slightly from that used for the GeForce GTX 670 in terms of the memory organisation - half the channels can only support a single module instead of two. The maximum configuration on the PCB of the GeForce GTX 660 Ti is thus 3 GB interfaced at 256 bits, as against 4 GB interfaced at 256 bits on the GTX 670. Thus it’s impossible to position 3 GB at 192 bits on this PCB to come up with such a variant of the GTX 660 Ti, which will probably use the PCB used for the GTX 670. Although this doesn’t affect the user at all, we can't see why Nvidia has organised the PCB in this way…
The connectivity is identical to that on the GeForce GTX 680 with two DVI Dual Link connectors an HDMI 1.4a 3 GHz connector and a DisplayPort 1.2 connector.
The official GPU Boost clock on the GeForce GTX 670 and 660 TI is 980 MHz for a base clock of 915 MHz. Our samples however had maximum clocks of 1084 and 1071 MHz. According to the information that we have gathered up until now, this maximum clock can go as high as 1136 MHz on the best samples. In other words, the maximum clock can be in the same order as that of the GeForce GTX 680, although the official base spec and GPU Boost are lower than this.
GTX 660 Ti Asus DirectCU II TOP and EVGA SC
Asus and EVGA were able to supply us with their first customised models in time for this test:
Asus GeForce GTX 660 Ti DirectCU IIAsus offers three versions of its customized GeForce GTX 660 Ti. They all have the same design as that used for the brand’s excellent GTX 670 DirectCU II:
Asus GeForce GTX 660 Ti DirectCU II (DC2): €340
Asus GeForce GTX 660 Ti DirectCU II OC + Borderlands 2 (DC2O): €350
Asus GeForce GTX 660 Ti DirectCU II TOP (DC2T): €360
The OC version has its GPU clocked up from 915/980 MHz to 967/1058 MHz (base/guaranteed boost) and the TOP is at 1058/1136 MHz. The memory however remains at 1502 MHz, which is a shame as the GK104 GPU that these cards run on benefits greatly from an increased memory bandwidth.
Asus sent us a test sample of the TOP model. As is often the case, press samples are very carefully selected, either for their overclocking potential or their low energy consumtion and therefore low noise levels. Our TOP card would seem to have been selected for its cooperation vis-à-vis GPU Boost… While the guaranteed GPU Boost clock is 1136 MHz, our model got up to 1279 MHz! This is a gain of more than 12% in power over the guaranteed spec… or at least it would be if the card were able to take advantage of it.
For the GTX 670 DirectCU II TOP, Asus had revised the GPU Boost energy consumption limit upwards. By default the TOP model could thus manage up to +/- 175W compared to 140W for the reference card. This change was necessary to allow GPU Boost benefit from a GPU able to go as high as this. Unfortunately, Asus hasn’t gone the same way on the GeForce GTX 660 Ti and has clipped the wings of the GPU on our sample... at least for the moment. A new bios could well be introduced to correct the situation however.
Asus has used its standard DirectCU II cooling system here, in its dual slot version. There are three 8mm nickel plated copper heat pipes set into the aluminium base and in direct contact with the GPU. They run up to a wide radiator over which two 75mm low profile fans are mounted and a relatively thick, and therefore rigid, metal casing closes everything off.
Asus hasn’t adapted the format of this cooler to the PCB on which the GPU is more or less centrally positioned, though the base of the radiator has been set off to the left of the cooler. The cooler isn’t therefore positioned ideally in relation to the PCB. It also sticks out over the back of the card by 35mm and the bracket is used to block the hole at the front of the card. The total length of the card is thus 27 cm. A strengthening bar has been placed on the top of the card so as to guarantee its rigidity.
Although Asus uses the reference connectivity (2 DVI, 1 HDMI, 1 DisplayPort), the power stage has been entirely revisited. While there are four phases to power the GPU on the reference PCB, this DirectCU II model has six. Moreover Asus has put the power circuit for the memory on the other side of the PCB, which should somewhat improve the quality of the different signals. One small difference compared to the Asus GTX 670 is that here a single phase is used for the memory, compared to two.
As with the reference card, two 6-pin power supply connectors are needed. Another small development compared to the GTX 670 DirectCU II is that these connectors have been turned over so as to avoid having the sprockets under a heatpipe, which made removing the power supply cables rather complicated.
The card comes with a CD with the drivers, a small mounting guide, a DVI to VGA adaptor and a double molex to 6-pin PCI Express power supply cable convertor.
EVGA GeForce GTX 660 SuperClockedEVGA has brought out four versions of its GeForce GTX 660 Ti based on the reference design but equipped with a slightly customized cooling system. They are based on the same design as the EVGA GTX 670:
EVGA GeForce GTX 660 2 GB: €310
EVGA GeForce GTX 660 3 GB: €330
EVGA GeForce GTX 660 2 GB SuperClocked (SC): €330
EVGA GeForce GTX 660 3 GB SuperClocked (SC): €350
The GPU on the SuperClocked versions is clocked up from 915/980 MHz (base/guaranteed boost) to 980/1058 MHz. The memory however is still at 1502 MHz, which is a shame as the GK104 GPU that these cards run on benefits greatly from an increased memory bandwidth. FTW models are also available but they have a slightly different design.
EVGA supplied us with a sample of the SuperClocked 2 GB with a maximum GPU Boost clock of 1150 MHz in practice, or 9% up on the guaranteed spec.
EVGA has used the reference GeForce GTX 660 Ti PCB. It's a very short 17 cm and the cooling system, derived from the reference one, extends it via a plastic mount taking the total card length to 24 cm.
The cooling system uses the same blower fan as that used on the reference card. It also has the same overall structure and the same small radiator is used on the power stage, which is equipped with four phases for the GPU. The main cooling block is however slightly different to the one used on the reference card: the copper insert is smaller but the radiator is a little bit wider and its fins are thicker and more rigid. Finally the slightly curved casing that covers the card has a customised design.
The card comes with a CD for the drivers, a small mounting guide, a DVI to VGA adaptor and two double molex to 6-pin PCI Express power supply cable convertors. For the EVGA fans, there’s also a complete brand kit: poster, badge and stickers!
Noise levels and GPU temperature
NoiseTo observe the noise levels produced by the various solutions, we put the cards in a Cooler Master RC-690 II Advanced casing and measured noise at idle and in load. We used an SSD and all the fans in the casing, as well as the CPU fan, were turned off for the reading. The sonometer was placed 60 cm from the closed casing and ambient noise was measured at +/- 20 dBA. Note that for all the noise and temperature readings, we used the real Radeon HD 7950 reference design, which is different to that used for the press card supplied by AMD. We weren’t able to take similar readings for the Radeon HD 7950 v2.
Remember that our GeForce GTX 670 sample was one of those with an excessively annoying mechanical noise coming from the cooling system. While this did affect the readings, the levels measured don’t tell the full story in terms of how disturbing the noise was.
The noise levels on the GeForce GTX 660 Ti, based on the same design, are somewhat lower. However the card isn’t all that quiet at idle. The EVGA solution does slightly better than the GTX 660 Ti but it’s still a long way behind the Asus, which is totally silent at idle and very quiet in load, quieter than most other graphics card at idle! We should say however that the DirectCU II doesn’t send any air out of the casing, which makes the job easier here. The final bios slightly increases noise levels but they remain at an excellent level.
TemperaturesStill in the same casing, we took a reading of the GPU temperature given on the internal sensor:
The reference GeForce GTX 660 Ti and the EVGA model are well cooled but the Asus solution is still a notch above.
Readings and infrared thermography
Infrared thermographyFor this test we used the new protocol described here.
First of all, here's a summary of all the readings:
There’s almost no difference between the cards at idle, except that the Asus is a good deal quieter.
In load, the Asus card’s advantage is accentuated in terms of noise levels, at the cost of an increase in internal temperatures however.
Finally, here’s what all this gives in terms of thermal imaging:
While the Asus GeForce GTX 660 Ti DirectCU II TOP GPU is very well cooled, its power stage heats up as much, if not more, than the other two cards. This isn’t really a problem but some caution will be needed if the energy consumption limit is increased so as to fully benefit from the card’s potential.
The 10% increase in the energy consumption limit of the Asus card has an impact on temperatures, though it remains limited.
Energy consumption and performance/watt
Energy consumptionWe used the test protocol that allows us to measure the energy consumption of the graphics card alone. We took these readings at idle on the Windows 7 desktop as well as with the screen in standby so as to observe the impact of ZeroCore Power. In load, we took our readings in Anno 2070, at 1080p with all the settings pushed to their max, and in Battlefield 3, at 1080p in High mode:
The energy consumption on the reference GeForce GTX 660 Ti is identical to that on the GeForce GTX 670. The factory overclocked Asus and EVGA cards both maximize the available thermal envelope.
While the TDP of the Radeon HD 7950 v2 is up 12.5% on the original version, its energy consumption is up 21% in Anno 2070 and 32% in Battlefield 3. This is becasue of the improved precision of PowerTune which is now better able to estimate power consumption and therefore use the available thermal envelope more fully.
The final GTX 660 Ti DirectCU II TOP bios increases energy consumption in load by 10%.
We have put these energy consumption readings together with the performance measures, giving fps per 100W to make the data more legible:
[ Anno 2070 1080p Max ] [ Battlefield 3 1080p High ]
These graphs highlight the reduction in energy yield that automatically comes in when the TDP is maximized using a more precise energy consumption algorithm and/or a higher voltage. This yield remains a good deal better than what we saw on the previous generation, but it falls significantly with the Radeon HD 7950 v2. The Radeon HD 7870, the GeForce GTX 660 Ti and, above all, the GeForce GTX 670 are the most efficient. Note that energy efficiency drops slightly with the new Asus bios, which allows the turbo more room for manoeuvre.
Note however that each game represents a particular case and that the yield varies from one card sample to the next, on the Radeons because their energy consumption varies and the GeForces because their maximum clock and therefore their performance levels vary. Here, the GeForce GTX 680 went up to 1110 MHz, the GeForce GTX 670 up to 1084 MHz and the GeForce GTX 660 Ti up to 1071 MHz.
Test protocolFor this test, we used the previous protocol, adding Max Payne 3. All the games were tested with their latest patches, most of them being maintained via Steam/Origin.
We have decided no longer to use the level of MSAA (4x and 8x) as the main criteria for segmenting our results. Many games with deferred rendering offer other forms of antialiasing, the most common being FXAA, developed by NVIDIA. There’s therefore no point in drawing up an index based on a certain antialiasing level, which in the past allowed as to judge MSAA efficiency, which can vary according to the implementation. At 1920x1080, we therefore carried out the tests with two different quality levels: extreme and very high, which automatically includes a minimum of antialiasing (either MSAA 4x or FXAA/MLAA/AAA).
Also we no longer show decimals in game performance results so as to make the graph more readable. We nevertheless note these values and use them when calculating the index. If you’re observant you’ll notice that the size of the bars also reflects this.
All the Radeons were tested with the recently released Catalyst 12.8 drivers and all the GeForces were tested with the beta 305.37 drivers.
We managed to test the GeForce GTX 600s at their minimum guaranteed GPU Boost specs. To do this we played with the overclocking settings to reduce the base clock by slightly adjusting the energy consumption limit so that the clock in practice would correspond to that of a card with a maximum turbo clock equal to that of the official GPU Boost clock. Note that this isn’t the same as turning GPU Boost off!
To recap, we took the opportunity of the report on the GeForce GTX 690 to introduce the X79 platform and a Core i7 3960X into our test system so as to benefit from PCI Express 3.0. Note that the activation of PCI Express 3.0 isn‘t automatic on the GeForce GTX 600s and requires a registry modification, which we of course effected and which gives an average gain of +/- 2%.
Test configurationIntel Core i7 3960X (HT off, Turbo 1/2/3/4/6 cores: 4 GHz)
Asus P9X79 WS
8 GB DDR3 2133 Corsair
Windows 7 64 bits
GeForce beta 305.37 drivers
Benchmark: Alan Wake
Alan Wake is a pretty well executed title ported from console and based on DirectX 9.
We used the game’s High quality levels and added a maximum quality level with 8x MSAA and 16x anisotropic filtering. We carried out a well defined movement and measured performance with Fraps. The game is updated via Steam.
Compared to the GeForce GTX 670, the GeForce GTX 660 Ti suffers in this game that is particularly demanding in terms of memory bandwidth. The Radeons do particularly well.
Benchmark: Anno 2070
Anno 2070 uses a development of the Anno 1404 engine which includes DirectX 11 support.
We used the very high quality mode on offer in the game and then, at 1920x1080, we pushed anistropic filtering and post processing to a max to make them very resource hungry. We carried out a movement on a map and measure performance with fraps.
Here, the performance of the GeForce GTX 660 Ti is very close to that of the GeForce GTX 670.
Benchmark: Batman Arkham City
Batman Arkham City
Batman Arkham City was developed with a recent version of Unreal Engine 3 which supports DirectX 11. Although this mode suffered a major bug in the original version of the game, a patch (1.1) has corrected this. We used the game benchmark.
All the options were pushed to a maximum, including tessellation which was pushed to extreme on part of the scenes tested. We measured performance in Extreme mode (which includes the additional DirectX 11 effects) with MSAA 4x and MSAA 8x. The game is updated via Steam.
Memory bandwidth requirements hold the GeForce GTX 660 Ti back in this game.
Benchmark: Battlefield 3
Battlefield 3 runs on Frosbite 2, probably the most advanced graphics engine currently on the market. A deferred rendering engine, it supports tessellation and calculates lighting via a compute shader.
We tested High and Normal modes and measured performance with Fraps, on a well-defined route. The game is updated via Origin.
The GeForce GTX 600s are particularly efficient at 1080p in Battlefield 3, allowing the GTX 680 to outdo the Radeon HD 7970 GHz Edition. The GeForce GTX 660 Ti does pretty well here, outperforming the Radeon HD 7950 v2 in high quality mode. As will often be the case, it falls back a bit with MSAA 4x, which is activated in Ultra mode.
Although only in DirectX 9 mode, the rendering is pretty nice, based on version 3.5 of Unreal Engine.
All the graphics options were pushed to a max (high) and we measured performance with Fraps, with MSAA 4x and then 8x.
With MSAA 8x, the GeForce GTX 600s struggle to measure up to the Radeons, particularly the GTX 680 and 660 Ti which seem to lack memory bandwidth.
Benchmark: Civilization V
Pretty successful visually, Civilization V uses DirectX 11 partly to improve quality and optimise performance in the rendering of terrains thanks to tessellation and also to implement a special compression of textures thanks to the compute shaders, a compression which allows it to keep the scenes of all the leaders in the memory. This second usage of DirectX 11 doesn’t concern us here however as we used the benchmark included on a game card. We zoom in slightly so as to reduce the CPU limitation which has a strong impact in this game.
All settings were pushed to a max and we measured performance with shadows and reflections. The game is updated via Steam.
The GeForce GTX 600s benefit here from new 300 series drivers, which bring a significant gain. As is often the case, the GeForce GTX 660 Ti suffers when the level of MSAA increases.
Benchmark: Crysis 2
Crysis 2 uses a development of the Crysis Warhead engine optimised for efficiency but adds DirectX 11 support via a patch and this can be quite demanding. As, for example, with tessellation, implemented abusively in collaboration with NVIDIA with the aim of causing Radeon performance to plummet. We have already talked about this issue here.
We measured performance with Fraps on version 1.9 of the game.
The GeForce GTX 660 Ti is on a par with the Radeon HD 7870 in Ultra mode here but between the two Radeon HD 7950s in Extreme mode.
Benchmark: DiRT Showdown
Codemaster’s latest game, DiRT Showdown benefits from a slight development of the in-house DirectX 11 engine. In partnership with AMD, the developers have introduced some advanced lighting which takes numerous sources of direct and indirect light into account to simulate overall lighting. These additional options were introduced with the first patch of the game which we used on our system. The game is updated via Steam.
To measure performance, we pushed all the graphics options to maximum and used fraps on the game’s internal tool.
Although the GeForce GTX 680 equals the Radeon HD 7970 at 1080p without advanced lighting, once this is turned on its performance levels take a dive as Nvidia didn’t have access to this patch sufficiently early to be able to offer specific optimisations for it. We’ll probably have to wait a little longer for this to be put into place.
Benchmark: Max Payne 3
Max Payne 3
Max Payne 3 has nice rendering overall though it does vary in places, notably with 'console quality' textures. It uses a DirectX 11 engine with deferred rendering which supports several advanced effects such as HDAO or tessellation, which is rather heavy once pushed to a max.
It supports FXAA and MSAA, which is very heavy here given the type of rendering used. MSAA is still required for full aliasing as FXAA isn't sufficient.
We pushed all options to a max and used Fraps on a well defined route.
The GeForce GTX 600s are particularly at ease in this game, with Nvidia working with the developers upstream of release. The maximum level of tessellation is very heavy, which slows the Radeons down a bit. They make up ground when MSAA 4x is activated however. The GeForce GTX 660 Ti is thus on a par with the Radeon HD 7970 without MSAA but slips back to the level of the Radeon HD 7950 with.
Benchmark: Metro 2033
Still one of the most demanding titles, Metro 2033 forces all recent graphics cards to their knees. It supports GPU PhysX but only for the generation of particles during impacts, a rather discreet effect that we therefore didn’t activate during the tests. In DirectX 11 mode, performance is identical to DirectX 10 mode but with two additional options: tessellation for characters and a very advanced, very demanding depth of field feature.
We tested it in DirectX 11, at maximum quality (including DoF and MSAA 4x), very high quality as well as with tessellation on.
No mono-GPU card allows you to play Metro 2033 at 1080p comfortably at maximum quality. The GeForce GTX 600s suffer from a lack of memory bandwidth in this mode, which limits their performance.
Benchmark: The Witcher 2 EE
The Witcher 2 Enhanced Edition
The Witcher 2 graphics engine has been worked on gradually over time to give us the current version in the recent Enhanced Edition. Although it’s based on DirectX 9, it's relatively demanding once all the graphics options are pushed to a maximum, one of these being particularly demanding: UberSampling. In reality it’s a 4x supersampling type of antialiasing with a few optimisations.
We tested the game at maximum quality with and without UberSampling. Performance was measured with Fraps.
The Radeon HD 7000s dominate quite easily in this game and the Radeon HD 7870 outperforms the GeForce GTX 660 Ti. Note however that the fluidity isn’t always perfect with the Radeons.
Benchmark: Total War Shogun 2
Total War Shogun 2
Total War Shogun 2 has a DirectX 11 patch, developed in collaboration with AMD. Among other things, it gives tessellation support and a higher quality depth of field effect.
We tested it in DirectX 11 mode, with a maximum quality, MSAA 4x and MLAA. This game is updated via Steam.
Unusually Nvidia dominates here with MSAA 4x on.
Performance recapAlthough individual game results are obviously worth looking at when you want to gauge performance in a specific game, we have also calculated a performance index based on all tests with the same weight for each game. We set an index of 100 to the original Radeon HD 7950:
[ Standard ] [ By performance ]
In the end, the GeForce GTX 660 Ti at its guaranteed clocks is on a par with the Radeon HD 7870. A good sample that can clock much higher is required to close the gap on the original Radeon HD 7950. The new version consolidates the advantage.
The addition of Max Payne 3 to the protocol improves the GeForce GTX 680's standing a bit, taking it from a position of deficit (-0.3%) in comparison to the Radeon HD 7970 to a lead of 0.4%. This is minimal for cards which on average give similar performance but with marked differences in certain games.
Finally, here are the gains noted between our GeForce GTX 600 samples with their respective GPU Boost specs:
GTX 680 (1110 MHz max): +2.1%
GTX 670 (1084 MHz max): +5.3%
GTX 660 Ti (1071 MHz max): +4.4%
The performance of the Asus and EVGA cards
GTX 660 Ti Asus DirectCU II TOP and EVGA SuperClocked: performanceWe looked at the performance of the Asus and EVGA GeForce GTX 660 Tis on part of our test protocol. As for the previous benchmarks, the cards were tested as they came, namely with the maximum GPU Boost clock specific to our samples, as well as when limiting the GPU to its officially guaranteed minimum clock:
We can see here that the gains given by the overclocked cards are relatively modest. If we compare the results of the GTX 600s between them, without limiting GPU Boost, the EVGA SC model and Asus TOP bring gains of 2.9% and 5.1% respectively over the reference card. The new Asus bios increases this advantage to 8.3% and allows the TOP to get close to the reference GTX 670.
On paper however, the EVGA card gives a gain of 8% and the Asus a gain of 16%! If we look at the maximum GPU clocks of our samples, the potential gain over the reference card is still 8% for the EVGA but increases to 19% for the Asus.
Why then is the gain in performance so low in practice? There are two main explanations: The first is that the GeForce GTX 660 Ti GPU is partly held back by its memory bandwidth but the most important is the thermal envelope limit. The 134W limit within which GPU Boost can increase the GPU clock isn’t high enough to allow these cards to benefit from the maximum clocks in the majority of games.
Nvidia’s partners do have the option of modifying this energy consumption limit, which is what Asus has done with the final bios for the GTX 660 Ti DirectCU II TOP. This is very welcome as the cooling system on this model is able to handle the demands placed on it without making too much noise. It is also possible to modify this limit manually using overclocking tools such as Precision X and GPU Tweak.
192-bit memory bus: the impact
192-bit memory bus: the impactWe tried to observe the impact on performance linked to the use of the 192-bit memory bus more closely by comparing the performance of the GeForce GTX 660 Ti and GTX 670 at the same maximum GPU Boost: 980 MHz. Performance on the GeForce GTX 660 Ti here is affected both by the reduction in memory bandwidth and the reduction in the number of ROPs (down from 32 to 24). Having fewer ROPs mostly affects the card in situations with a high level of MSAA because when MSAA isn't high the pixel throughput is partly limited upstream in terms of the communication between blocks of processing units and the ROPs. MSAA is also demanding in terms of memory bandwidth, which doesn’t allow us to measure the impact of this separately from that of the number of ROPs.
The resulting loss in performance varies between 3.4% and 22.4%, with the higher end of this range being seen as expected when MSAA is used.
Radeon HD 7950 v2: the gains
Radeon HD 7950 v2: the gainsHere’s a summary of the performance gains brought by the Radeon HD 7950 v2, whose frequence and brute power have increased by between 6.3 and 15.6% according to the current estimate of energy consumption linked to the load.
The gains vary between 3.9% and 8.4% with an average of 6.3%, which is relatively modest given the increase in maximum clock. As with the factory overclocked GTX 660 Tis, this can be explained by the fact that the memory clock hasn’t changed and by the limitation due to the available thermal envelope.
ConclusionVery quickly when working on this report, it became clear that we wouldn’t be able to give you a clear conclusion with universal purchasing advice. For gaming at 1080p, we will recommend you on a case by case basis to choose between the Radeon HD 7870, the GeForce GTX 660 Ti, the GeForce GTX 670 and the Radeon HD 7970 GHz Edition. Lets start with this last one that we advise you to go for if you’re looking for maximum performance without having to go for a multi-GPU solution. Make sure you avoid the reference model however as it's too noisy.
Among the three others, our default preference is still with the Radeon HD 7870. It offers solid performance, doesn't draw too much power and, above all, Sapphire is selling it at a price that defies all competition: €270. This offers the best price/performance ratio for gamers and we'll check soon to see how good its cooling system is.
The GeForce GTX 660 Ti and GTX 670 are however intermediary solutions that you shouldn’t ignore. The first will give you an additional 10% performance if its GPU Boost is cooperative and you’re not a fan of MSAA type atialiasing. Its acceleration isn’t its strong point. Again depending on how cooperative its GPU Boost is, the GeForce GTX 670 will give an extra 15% to 20% in performance but this time without too many issues with MSAA.
Apart from the level of performance, a personal preference for one brand or another, one functionality or another will probably help you in your choice of solution. We, for example, have a small preference for Nvidia’s drivers… but we also like how AMD graphics cards turn themselves off during screen standby.
Decidedly nothing in this world is simple… except that is when it comes to the Radeon HD 7950 v2! Its arrival might have complicated our purchasing advice even more but in fact the average performance gain of 6% for increased energy consumption of 20% to 30% just isn't worth it and AMD would be well advised to suggest that its partners only offer as optional the new BIOS making the passage to the v2 possible.
Let’s finish with a word on the GeForce GTX 660 Ti SuperClocked and DirectCU II TOP customised by EVGA and Asus respectively. The first is preferable to the reference model and retains the advantage of extracting hot air from the casing, which can be important for compact casings. In absolute terms however, it remains too noisy for the number of watts to be dissipated, especially when compared to the Asus DirectCU II, which is very strong on this point.
At both idle and in load, the Asus is extremely quiet. Of course the TOP doesn't come cheap and struggles to take full advantage of its higher clock because of the overly strict energy consumption limit (which could be extended via a new bios). Asus however offers two other variants, also priced high but certainly worth it for those for who the level of noise of the cooling system is an important consideration. The standard DirectCU II comes in at €340 and the DirectCU II OC at €350. What’s more, the DirectCU II OC comes with Borderlands 2, which makes it a pretty good deal in the end, if you’re interested in this game of course.
As we thought, using a new bios, Asus has modified the energy consumption limit for the GTX 660 Ti DirectCU II TOP so as to allow it to take advantage of the higher maximum clock. All cards in stores should come with this bios.
Copyright © 1997-2013 BeHardware. All rights reserved.