Review: Nvidia GeForce GTX 660, Asus DirectCU II TOP and SLI - BeHardware
>> Graphics cards

Written by Damien Triolet

Published on September 14, 2012

URL: http://www.behardware.com/art/lire/876/


Page 1

Introduction

After rolling out the GK104 GPU in the GeForce GTX 600 high-end, Nvidia has introduced the GK106, a second GPU designed to make Kepler architecture accessible to gamers. Will it finally push the Radeon HD 7800s to one side? This is what we’re going to find out in this comprehensive review of the GeForce GTX 660 in which we will look at SLI performance, the Asus DirectCU II TOP version and whether it’s worth upgrading from an older generation card.


Kepler for everyone
Nvidia has been able to put the GK104 in numerous products to cover the whole of the high-end segment and has even managed to dip into the ‘Performance’ section, the preferred gamer choice coming in at between €200 and €300. The GeForce GTX 660 Ti, which isn’t far off €300, doesn’t however really make Nvidia competitive with the Radeon HD 7800s, which have done very well since their launch six months ago.

Given that this is one of the biggest market segments, Nvidia had to address the situation with something more than just end of line products from the previous generation.


To popularise Kepler architecture among gamers, a faster derivative of the GK107 (though apparently not fast enough for Nvidia to want the specialised press to take too close a look…) in the form of the GeForce GTX 650 wasn’t enough. An intermediary GPU was required between the GK104 and the GK107. This is where the GK106 comes in, introduced first of all with the GeForce GTX 660 announced at €230, before shortly appearing at 150€ on the GeForce GTX 650 Ti.


Page 2
The GK106: 5 SMX and 192-bit

The GK106
The GK106 uses the same variant of the Kepler architecture as the GK104, differing simply in its configuration.

It thus retains the same organisation of processing units in SMXs, each of which has 192 processing units, 16 texturing units and a 64 KB L1 cache. SMXs are a development of the previous generation SMs, optimised for a higher yield, especially in terms of energy consumption. You can find all the details on this in the review of the GeForce GTX 680. Each SMX can throughput up to 192 FMA instructions per cycle (384 flops), four pixels per cycle and a triangle every two cycles.

The memory interface is also of the same type with blocks containing a 64-bit memory controller, optimised for high-frequency GDDR5, a 128 KB L2 cache and 8 ROPs charged with writing pixels to memory after they have been rendered.


While the GK104 has eight SMXs and four memory controllers, the GK106 has just five of the first and three of the second. It thus has a total of 960 processing units, 80 texturing units, a 384 KB L2 cache, a 192-bit memory bus and 24 ROPs. While the GK104 throughputs 32 pixels and four triangles per cycle, the GK106 manages up to 20 pixels and 2.5 triangles.

Note that the throughput of pixels is limited by the number of SMXs (5x4) although the ROPs would be able to write 24 per cycle. These additional ROPs can however be useful for multisampling type antialiasing processing, which can add a significant load.


Like the GK104, the GK106 is manufactured on TSMC's 28nm process. The fact that there are fewer execution blocks means only 2.5 billion transistors are required as opposed to 3.5 billion on the GK104, which puts the GK106 on a similar level of complexity as the Pitcairn CPU, used in the Radeon HD 7800s, which has 2.8 billion transistors. There has also been a corresponding reduction in surface area (214 mm² for the GK106, 212 mm² for Pitcairn and 294 mm² for the GK104), which brings down manufacturing costs.

Note that AMD seems to have a slightly higher transistor density, probably because its architecture is based on more SRAM for its different caches and registers.


Page 3
Specifications, the reference GeForce GTX 660

Specifications

The memory bandwidth on the GeForce GTX 660 is identical to that on the GTX 660 Ti and very similar to that on the Radeon HD 7800s which however have a bigger memory bus. The GTX 660 makes up the difference thanks to its faster GDDR5.

The texturing and processing power is however 25% down on the GTX 660 Ti. The GTX 660’s processing power falls midway in comparison to the Radeon HD 7800s, dominates when it comes to texturing and geometric processing but is behind on fillrate.

Just like the other GTX 600s, the GTX 660 supports GPU Boost to increase the GPU clock when energy consumption is under a certain limit. Remember that in contrast to CPUs and the Radeons, this turbo is variable according to energy consumption as well as according to the unique clock limit for each sample. We have detailed this issue here.

As with several other of its graphics cards, Nvidia uses an assymetric memory configuration here: 2 GB interfaced at 192-bits. In practice this means that the GeForce GTX 660 has a fast 1.5 GB memory space, at 192 bits, and a slower, 64-bit, 512 MB reserve. The drivers have to be designed so as not to use this reserve except as a last resort or for data that isn't regularly accessed.


The reference GeForce GTX 660
Nvidia supplied us with a reference GeForce GTX 660:




The reference GeForce GTX 660 may look identical to the GTX 670 and GTX 660 Ti. It is a relatively compact 24 cm long and the PCB is just 17.2 cm! This gives these GeForces a rather original looking design with a plastic extension at the back of the PCB to make the card long enough for a radial or blower fan to be fitted.

Nevertheless the GTX 660 is different to these other models in several ways, notably because its energy consumption has been revised downwards: its TDP is 140W and the GPU Boost energy consumption target is 115W. To recap, this means that GPU Boost can increase the clock beyond the base clock when the energy consumption is within the 115W limit but won’t reduce the clock to below the base clock as long as the GPU doesn't exceed 140W. In practice, this only occurs during stress tests such as Furmark and OCCT.

With this reduced energy consumption, Nvidia can manage with a single very basic cooling block: an aluminium radiator equipped with a big copper inset at its base. It is slightly different to the one that equips the GeForce GTX 670 and 660 Ti: while it is similar in size, the copper insert is thinner and the fins less numerous.

A standard shell covers the card. Overall the finish isn’t that high in quality, well below what we expect for a reference card. Some samples also seem to suffer from a mechanical noise coming from the fan or its holder, a problem we also found on our GeForce GTX 670 and GTX 660 Ti reference samples.

Nvidia has designed its PCB with four phases to supply current to the GK106 as well as one for the Hynix R0C GDDR5 certified at 1.5 GHz. The GeForce GTX 660 has just one 6-pin power supply connector, which corresponds to maximum energy consumption of 150W according to the PCI Express specifications, in accord with the target TDP.

The PCB differs from the PCB on the GTX 670 and GTX 660 Ti, with the power supply stage repositioned at the back of the card and not between the GPU and the video outs, which facilitates cooling.

The connectivity is identical to that on the GTX 680, GTX 670 and GTX 660 Ti: there are two DVI Dual Link connectors, one HDMI 1.4a 3 GHz connector and a DisplayPort 1.2 connector. Just like the GK104, the GK106 can pilot up to four screens at the same time.

Our GTX 660 sample clocked up to 1097 MHz thanks to GPU Boost, which is 65 MHz more than the guaranteed clock. In practice, its energy consumption stopped it from fully benefiting from this potential.


Page 4
Asus GTX 660 DirectCU II TOP

Asus supplied us with their first customised model in time for this test:

Asus GeForce GTX 660 DirectCU II
Asus offers three versions of its customizsed GeForce GTX 660. They use a similar design to that used for the Asus GTX 670 and GTX 660 Ti DirectCU II:

Asus GeForce GTX 660 DirectCU II (DC2): €250
Asus GeForce GTX 660 DirectCU II OC (DC2O): €260
Asus GeForce GTX 660 DirectCU II TOP (DC2T): €280

The OC version has its GPU clocked up from 980/1033 MHz to 1019/1084 MHz (base/guaranteed boost) and the TOP is at 1071/1136 MHz. The memory has been clocked up from 1502 MHz to 1527 MHz on the TOP.

Asus supplied us with a test sample of the TOP. As is often the case, press samples are very carefully selected, either for their overclocking potential or their low energy consumtion and therefore low noise levels. In general, the GTX 600s seem to be selected for their cooperation vis-à-vis GPU Boost, however this doesn’t seem to have been the case here. While the guaranteed GPU Boost clock is 1136 MHz, our sample was limited to 1175 MHz in practice, which is a long way down on what we saw with the Asus GTX 670 and GTX 660 Ti that could automatically go up to around 1300 MHz.

Note that to best benefit from the maximum clock, Asus has increased the GPU Boost energy consumption limit from 115W to 130W and probably also the TDP from 140 to 155W.




Asus has used its standard double slot version DirectCU II cooling system, identical to the one used for the GeForce GTX 670 and 660 Ti. There are three 8mm nickel plated copper heat pipes set into the aluminium base and in direct contact with the GPU. They run up to a wide radiator over which two 75mm low profile fans are mounted. The only difference is an aesthetic one, with Asus replacing the traditional metal shell with a plastic one to reduce costs.

The PCB has been fully customised and is 1cm longer than the one used for the Asus GTX 670 and 660 Ti, making it 28 cm in total. While Asus has gone for standard connectivity (2 DVIs, 1 HDMI, 1 DisplayPort), the power stage has been completely revised. While the reference PCB has four analogue phases for the GPU, this DirectCU II model has six digital ones. As on the reference card, there’s just a single phase for the memory.

It requires just a single 6-pin power connector. It is slightly difficult to access it and when pulling out the cable you have to push down on the radiator to get it out of the PCB. Note that given the higher TDP on the GTX 660 DirectCU II TOP, it would have been nice to see Asus providing two power connectors on this PCB. As things stand, Asus is pushing up close to the authorised energy consumption limits and when overclocking this may pose a problem with regard to the 12V supply that comes via the PCI Express bus on entry level mobos.

The card comes with a CD for drivers, a small mounting guide, a DVI to VGA adaptor and a double molex to 6-pin PCI Express power supply cable convertor.


Page 5
Noise, heat and thermal imaging

Noise
To observe the noise levels produced by the various solutions, we put the cards in a Cooler Master RC-690 II Advanced casing and measured noise at idle and in load. We used an SSD and all the fans in the casing, as well as the CPU fan, were turned off for the reading. The sonometer was placed 60 cm from the closed casing and ambient noise was measured at 20 dBA, which is as accurate as it is certified and calibrated to detect. Note that for all the noise and temperature readings, we used the real Radeon HD 7950 reference design, which is different to that used for the press card supplied by AMD. We weren’t able to take these readings for the Radeon HD 7950 v2.


Remember that, unfortunately, the common reference cooling system used for the GeForce GTX 670 and 660 Ti makes a very annoying vibrating or mechanical noise though this doesn't show up in terms of sound level readings. Our reference GTX 670 and to a lesser degree our reference GTX 660 Ti also suffer from this issue.

This mechanical noise is also present on the GTX 660 and we wonder if it might not be linked to the choice of radial fan, which is of dubious quality. In this test, the mechanical noise didn't show up in our readings. As the card was pretty silent apart from this mechanical noise, you notice it particularly at idle, especially when it isn't in a closed casing.

Once again, the Asus DirectCU II solution scored excellent results here. It is silent at idle and remains quiet in load, more than certain other graphics cards at idle! We should say however that the DirectCU II model doesn’t expel air from the casing, which simplifies its task here.


Temperatures
Still in the same casing, we took a reading of the GPU temperature with the internal sensor:


The Asus solution is very well cooled, which, along with the very low noise levels, bears witness to the effeciency of the DirectCU II's cooling system.

Here’s what the thermal imaging shows:


GeForce GTX 660 Ti de référence
GeForce GTX 660 de référence
Asus GTX 660 DirectCU II TOP
  [ Idle ]  [ Load ]
  [ Idle ]  [ Load ]
  [ Idle ]  [ Load ]

Putting the GTX 660 power stage at the end of the card looks to be beneficial in terms of the temperature of its components, even if they don't have a dedicated heatsink.

Note that the Asus DirectCU II power stage is relatively hot at idle, which shows abnormally high energy consumption in this mode.


Page 6
Energy consumption and performance/watt

Energy consumption
We used the test protocol that allows us to measure the energy consumption of the graphics card alone. We took these readings at idle on the Windows 7 desktop as well as with the screen in standby so as to observe the impact of ZeroCore Power. In load we opted for the readings in Anno 2070, at 1080p with all options pushed to maximum, as well as those in Battlefield 3, at 1080p in High:


The reference GeForce GTX 660 draws a lot less power at idle than the cards up the range; the GK106 is apparently more economical than the GK104. In load, the card is in general very close to the GPU Boost consumption target, namely 115W, which indicates that the margin is relatively slim and that the card won’t be able to run at its maximum clock most of the time, in contrast to a GeForce GTX 680 for example.

For its GTX 660 DirectCU II TOP, Asus has gone with an energy consumption target of 130W, which gives the GPU a bit more of a margin. In spite of its higher clock, this extra margin is needed for Battlefield 3 where energy consumption remains under 115W, one of the reasons being that the GPU of our Asus sample settles for a maximum of 1.162V as against 1.175V for the reference card.

This additional margin is however used fully in Anno 2070, which is a more demanding application. Note that this Asus model then tended to touch on the specified energy consumption limit for the 12V of the PCI Express port: 5.4A for a limit of 5.5A. In a demanding game with overclocking or in a stress test such as Furmark, we got close to 6.5A, which some entry level moobos may find an issue. It’s a shame that Asus hasn’t included two 6-pin power supply connectors on its PCB and exploited this possibility with the TOP model.

Strangely, the at-idle energy consumption of the Asus GTX 660 TOP is a good deal higher than that of the reference card. This however wasn’t confirmed on the samples supplied to other professional press journalists and could for example be linked to a bios which isn’t allowing the number of active phases to be reduced at idle. We have contacted Asus about these odd results but haven’t yet received an explanation.

We have put these energy consumption readings together with the performance measures, giving fps per 100W to make the data more legible:


[ Anno 2070 1080p Max ]  [ Battlefield 3 1080p High ]

The GK106, at least in its GTX 660 version, is less efficient than the GK104 in its GTX 670 and 660 Ti versions, which is no doubt due to its higher clock.

Note however that each game represents a particular case and that the yield varies from one card sample to the next, on the Radeons because their energy consumption varies and the GeForces because their maximum clock and therefore their performance levels vary. Here, the GeForce GTX 680 went up to 1110 MHz, the GeForce GTX 670 up to 1084 MHz, the GeForce GTX 660 Ti up to 1071 MHz and the GeForce GTX 660 up to 1097 MHz.


Page 7
Theoretical performance: pixels

Note that for all the theoretical performance readings, the GeForce GTX 600s were running at their respective maximum GPU clocks, namely 1110 MHz for the GTX 680, 1084 MHz for the GTX 670, 1071 MHz for the GTX 660 Ti and 1097 MHz for the GTX 660.

Texturing performance
We measured performance during access to textures of different formats in bilinear filtering: for standard 32-bit (4xINT8), 64-bit “HDR” (4x FP16) and 128-bit (4x FP32) and 32-bit RGB9E5, an HDR format introduced with DirectX 10 which enables the storing of 32-bit HDR textures with a few compromises.


The GeForce GTXs can filter FP16 textures at full speed in contrast to the Radeons which up until now made up for this with such superior filtering power that even though they had to filter FP16 textures at half speed, they were able to post similar speeds to the GeForces. This is no longer the case with the GeForce GTX 600s which have a considerable advantage here.

However, in this test, the GeForce GTX 600s struggle to achieve their maximum throughput when their GPU clock is at a maximum.

The Radeon HD 7700s and 7800s also struggle to achieve their theoretical maximum, this time because PowerTune stops them by reducing the GPU clock, judging that energy consumption is too high when their texturing units are saturated. This isn’t so for the Radeon HD 7900s.


Fillrate
We measured the fillrate without and then with blending, and this with different data formats:


[ Standard ]  [ With blending ]



Respecting fillrate, the GeForce GTX 600s and the GK104/GK106 GPUs are now able to transfer FP10/11 and RGB9E5 formats at full speed to the ROPs, although the blending of these formats is still carried out at half speed. While both the GeForces and the Radeon can process the single channel FP32 format at full speed without blending, only the Radeons maintain this speed with blending. They are however markedly faster with quadruple channel FP32s (HDR 128 bits). The GeForces however seem to use their available memory bandwidth better with FP16s with blending.

Although the Radeon 7800s have the same number of ROPs as the Radeon HD 7900s, their lower memory bandwidth doesn’t allow them to maximise use with blending as well as with FP16s and FP32s without blending.


Page 8
Theoretical performance: geometry

Note that for all the theoretical performance readings, the GeForce GTX 600s were running at their respective maximum GPU clocks, namely 1110 MHz for the GTX 680, 1084 MHz for the GTX 670, 1071 MHz for the GTX 660 Ti and 1097 MHz for the GTX 660.

Triangle throughput
Given the architectural differences between the various GPUs in terms of geometry processing, we obviously wanted to take a closer look at the subject. First of all we looked at triangle throughput in two different situations: when all triangles are drawn and when all the triangles are removed with back face culling (because they aren’t facing the camera):


The GeForce GTX 600s don’t do any better than the Radeon HD 7900s/7800s when triangles have to be rendered, perhaps because they are blocked up in one place or another or because performance has been reduced artificially to differentiate the Quadros from the GeForces.

When the triangles can be removed from the rendering, the GeForce GTX 600s take full advantage of their capacity to process 4, 3.5 or 2.5 triangles per cycle to push home their advantage.

Next we carried out a similar test using tessellation:


With the GeForce GTX 600, Nvidia reaffirms its superiority when it comes to processing a lot of small triangles generated by a high level of tessellation. The Radeon HD 7900s are on a par with the Radeon HD 7800s, which have the same number of fixed units dedicated the task.

The architecture of the Radeons means that they can be overloaded by the quantity of data generated, which then drastically reduces their speed. Doubling the size of the buffer dedicated to the GPU tessellation unit in the Radeon HD 6800s meant they gave significantly higher performance than the Radeon HD 5000s. AMD has continued down this line with the Radeon HD 7000s.

For reasons unknown, the GeForce GTX 580 performs relatively poorly and perhaps also suffers from congestion, though it is possible that this is linked to a geometric profile in the drivers or a limitation designed to favour the Quadros.


Page 9
Test protocol

Test protocol
For this test, we revised our protocol slightly, removing Bulletstorm and, after two and a half years of service, Metro 2033. We have on the other hand added the excellent Sleeping Dogs. All these games were tested with their latest patches, most of them being maintained via Steam/Origin.

We have decided no longer to use the level of MSAA (4x and 8x) as the main criteria for segmenting our results. Many games with deferred rendering offer other forms of antialiasing, the most common being FXAA, developed by NVIDIA. There’s therefore no point in drawing up an index based on a certain antialiasing level, which in the past allowed us to judge MSAA efficiency. At 1920x1080, we therefore carried out the tests with two different quality levels: extreme and very high, which automatically includes a minimum of antialiasing (either MSAA 4x or FXAA/MLAA/AAA). Compared to what we had in our tests of higher end cards, we have revised the quality level down from very high to high in the most demanding games so that the level corresponds to a mode that allows us to do our gaming at a sufficiently comfortable level on a card such as the GTX 660.

We no longer show decimals in game performance results so as to make the graph more legible. We nevertheless note these values and use them when calculating the index. If you’re observant you’ll notice that the size of the bars also reflects this.

All the Radeons were tested with the recently released Catalyst 12.8 drivers and all the GeForces were tested with the beta 306.23 drivers.

We managed to test the GeForce GTX 600s at their minimum guaranteed GPU Boost specs. To do this we played with the overclocking settings to reduce the base clock by slightly adjusting the energy consumption limit so that the clock in practice would correspond to that of a card with a maximum turbo clock equal to that of the official GPU Boost clock. Note that this isn’t the same as turning GPU Boost off!

To recap, we took the opportunity of the report on the GeForce GTX 690 to introduce the X79 platform and a Core i7 3960X into our test system so as to benefit from PCI Express 3.0. Note that the activation of PCI Express 3.0 isn‘t automatic on the GeForce GTX 600s and requires a registry modification, which we of course effected and which gives an average gain of +/- 2%.


Test configuration
Intel Core i7 3960X (HT off, Turbo 1/2/3/4/6 cores: 4 GHz)
Asus P9X79 WS
8 GB DDR3 2133 Corsair
Windows 7 64 bits
GeForce beta 306.23 drivers
Catalyst 12.8





Page 10
Benchmark: Alan Wake

Alan Wake

Alan Wake is a pretty well executed title ported from console and and based on DirectX 9. It has the particularity of imposing the use of MSAA, necessary for the correct rendering of grass.

We used the game’s Medium quality levels and added a maximum quality level with 8x MSAA and 16x anisotropic filtering. We carried out a well defined movement and measured performance with Fraps. The game is maintained via Steam.


Very demanding in terms of memory bandwidth with MSAA 8x in maximum quality mode, Alan Wake sees the GTX 660 and 660 Ti giving the same level of performance (identical memory bandwidth specs). The GeForce GTX 660 however finishes just behind the Radeon HD 7850 in both modes here.


Page 11
Benchmark: Anno 2070

Anno 2070

Anno 2070 uses a development of the Anno 1404 engine which includes DirectX 11 support.

We used the very high quality mode on offer in the game and then, at 1920x1080, we pushed anistropic filtering and post processing to a max to make them very resource hungry. We carry out a movement on a map and measure performance with fraps.


This time it’s processing power that counts and the GeForce GTX 660 is some way behind the GTX 660 Ti. It’s on a par with the Radeon HD 7850. At maximum quality, the SLI solution gives a higher yield, which allows the GTX 660s to equal the Radeon HD 7870s.


Page 12
Benchmark: Batman Arkham City

Batman Arkham City

Batman Arkham City was developed with a recent version of Unreal Engine 3 which supports DirectX 11. Although this mode suffered a major bug in the original version of the game, a patch (1.1) has corrected this. We used the game’s benchmark.

We measured performance in Extreme mode (which includes the additional DirectX 11 effects) with MSAA 4x and MSAA 8x. The game is updated via Steam.


The GeForce GTX 600s are more efficient here with MSAA 4x, while the Radeon HD 7000s benefit from their higher memory bandwidth and compression rates with MSAA 8x. The GeForce GTX 660 Ti is apparently limited by its memory bandwidth here and doesn’t perform any better than the GTX 660.

It looks very much as if AMD has abandoned the CrossFire X profile for this game…


Page 13
Benchmark: Battlefield 3

Battlefield 3

Battlefield 3 runs on Frosbite 2, probably the most advanced graphics engine currently on the market. A deferred rendering engine, it supports tessellation and calculates lighting via a compute shader.

We tested High and Normal modes and measured performance with Fraps, on a well-defined route. The game is updated via Origin.


The GeForce GTX 600s are particularly efficient at 1080p in Battlefield 3. The GTX 660 is on a par with the Radeon HD 7870 GHz, whether using a single card or multi-GPU solution.


Page 14
Benchmark: Civilization V

Civilization V

Pretty successful visually, Civilization V uses DirectX 11 to improve quality and optimise performance in the rendering of terrains thanks to tessellation and to implement a special compression of textures thanks to the compute shaders, a compression which allows it to keep the scenes of all the leaders in the memory. This second usage of DirectX 11 doesn’t concern us here however as we used the benchmark included on a game card. We zoom in slightly so as to reduce the CPU limitation which has a strong impact in this game.

All settings were pushed to a max and we measured performance with shadows and reflections. The game is updated via Steam.


The GeForce GTX 600s benefit here from new series 300 drivers that bring a significant gain, which allows the GTX 660 to close up on the Radeon HD 7870.

Note that while the GeForces are generally better positioned than the Radeons in this game, this isn’t the case with the multi-GPU solutions, where CrossFire X is more efficient.


Page 15
Benchmark: Crysis 2

Crysis 2

Crysis 2 uses a development of the Crysis Warhead engine optimised for efficiency but adds DirectX 11 support via a patch and this can be quite demanding. As, for example, with tessellation, implemented abusively in collaboration with NVIDIA with the aim of causing Radeon performance to plummet. We have already exposed this issue here.

We measured performance with Fraps on version 1.9 of the game.


The GeForce GTX 660 is hot on the heels of the Radeon HD 7870 in Extreme mode here but loses ground in Ultra mode.


Page 16
Benchmark: DiRT Showdown

DiRT Showdown

Codemaster’s latest game, DiRT Showdown benefits from a slight development of the in-house DirectX 11 engine. In partnership with AMD, the developers have introduced some advanced lighting which takes numerous sources of direct and indirect light into account to simulate overall lighting. These additional options were introduced with the first patch of the game which we used on our system. The game is updated via Steam.

To measure performance, we pushed all the graphics options to maximum and used fraps on the game’s internal tool.


Although the GeForce GTX 600s equal their direct competitors, the Radeon HD 7000s at 1080p without advanced lighting, once this is turned on its performance levels take a dive as Nvidia didn’t have access to this patch sufficiently early to be able to offer specific optimisations for it. The SLI profile is also particularly efficient here.


Page 17
Benchmark: Max Payne 3

Max Payne 3

Max Payne 3 has nice rendering overall though it does vary in places, notably with 'console quality' textures. It uses a DirectX 11 engine with deferred rendering which supports several advanced effects such as HDAO or tessellation, which is rather heavy once pushed to a max.

It supports FXAA and MSAA, which is very heavy here given the type of rendering used. MSAA is still required for full aliasing removal as FXAA isn't sufficient.

We pushed all the options to max and used Fraps on a well-defined route. The game is maintained via Steam.


The GeForce GTX 600s are particularly at ease in this game, with Nvidia working with the developers upstream of release and the maximum tessellation level is very demanding, which holds the Radeons back somewhat. They make up ground when MSAA 4x is activated however.

The GeForce GTX 660 is thus on a par with the Radeon HD7870 without MSAA 4x but is slightly behind with this filter.


Page 18
Benchmark: Sleeping Dogs

Sleeping Dogs

Sleeping Dogs offers a Hong Kong setting which can be very demanding for our graphics cards when the options on its DirectX 11 engine are pushed to a max.

We used the game’s benchmark. The game is maintained via Steam and we used version 1.5 for this test. The HD texture pack was of course installed. We used the high and extreme quality levels, which stand out for the level of SSAA (supersampling antialiasing) used along with FXAA: 2x and 4x respectively. Geometric aliasing is very significant in this game, which makes the use of advanced antialiasing particularly important.


While the scores are pretty balanced at high quality, the Radeons have the advantage in extreme mode. The GeForce GTX 660 is thus ahead of the Radeon HD 7850 at high quality but falls behind it in extreme mode.


Page 19
Benchmark: The Witcher 2 Enhanced Edition

The Witcher 2 Enhanced Edition

The Witcher 2 graphics engine has been worked on gradually over time to give us the current version in the recent Enhanced Edition. Although it’s based on DirectX 9, it's relatively demanding once all the graphics options are pushed to a maximum, one of these being particularly demanding: UberSampling. In reality it’s a 4x supersampling type of antialiasing with a few optimisations.

We tested the game at high and ultra quality, with UberSampling on in the second mode. Performance was measured with Fraps.


The Radeon HD 7000s dominate pretty easily in this ultra quality game but the results were more balanced at high quality where the GTX 660 struggles to outdo the Radeon HD 7850.


Page 20
Benchmark: Total War Shogun 2

Total War Shogun 2

Total War Shogun 2 has a DirectX 11 patch, developed in collaboration with AMD. Among other things, it gives tessellation support and a higher quality depth of field effect.

We tested it in DirectX 11 mode, with a maximum quality, MSAA 4x and MLAA. This game is updated via Steam.


Unusually Nvidia dominates here with MSAA 4x on. The GeForce GTX 660 easily outdoes the Radeon HD 7870 here, but with MLAA this positioning is inversed.


Page 21
Performance recap

Performance recap
Although individual game results are obviously worth looking at when you want to gauge performance in a specific game, we have also calculated a performance index based on all tests with the same weight for each game. We have also calculated sub-indexes based on the quality levels so as to highlight any smaller behavioural differences between the Radeons and GeForces. We set an index of 100 to the Radeon HD 7870 :


Indice qualité élevée
Indice global
Indice qualité extrême
  [ Standard ]  [ By performance ]
  [ Standard ]  [ By performance ]
  [ Standard ]  [ By performance ]

The GeForce GTX 660 comes between the Radeon HD 7850 and HD 7870 in our index, a little closer to the first in extreme mode and a a little closer to the second in high mode.

This is more or less marked on all the GeForce GTX 600s: they suffer more than the Radeon HD 7000s when graphics quality increases, particularly when this includes multisampling antialiasing.

In comparison to the GeForce GTX 460, from which an increasing number of users will be looking to upgrade, the GeForce GTX 600 gives a nice 68% gain.

GPU Boost’s capacity to go beyond its official clock doesn’t make much difference when it comes to the GeForce GTX 660 as its energy consumption limit is relatively strict, meaning it can’t take advantage of its max clock in most games. The gain is thus limited to 1%, while there’s a gain of 4%, 5% and 2% on our GeForce GTX 660 Ti, 670 and 680 cards respectively when they aren’t limited to the official Nvidia GPU Boost clock as is the case here.

The Radeon HD 7870s in CrossFire X also carry the day over the GeForce GTX 660s in SLI, with a small advantage of 8% impacted by a persistent AMD driver issue in Batman Arkham City. Outside of this, the scaling is pretty good on both sides.


Page 22
DirectCU II TOP performance and overclocking

GTX 660 Asus DirectCU II TOP, overclocking, GPU Boost
As we noted in the performance index, the capacity of GPU Boost to exceed the official clock of 1032 MHz doesn't have much of an impact on the GeForce GTX 660. This is due to the stricter energy consumption limit than on the other cards which stops the GPU from staying at max clock in most games. Here are the clocks approximately observed for the reference card in our various games:

Alan Wake: 1045-1058 MHz
Anno 2070: 1006 MHz
Batman Arkham City: 993-1032 MHz
Battlefield 3: 1045-1058 MHz
Civilization V: 1019 MHz
Crysis 2: 1006-1071 MHz
DiRT Showdown: 1058-1084 MHz
Max Payne 3: 1033-1071 MHz
Sleeping Dogs: 1032-1045 MHz
The Witcher 2 Enhanced Edition: 1058-1084 MHz
Total War Shogun 2: 1019-1045 MHz

Moving onto overclocking, we managed to clock the reference card up by +100 MHz for the GPU and by +225 MHz for the memory, giving us a GPU/Max boost/memory of 1084/1201/1615 up from 980/1097/1502. We managed to clock the Asus DirectCU II TOP GPU up 50 MHz and memory up 200 MHz, taking the clocks from 1071/1175/1527 MHz to 1123/1227/1627 MHz. We of course pushed the energy consumption limit to its max to observe performance: +10% or 127W for the reference GTX 660 and 143W for the Asus TOP.


  [ Anno 2070 ]  [ Battlefield 3 ]  [ Crysis 2 ]

The Asus GTX 660 DirectCU II TOP gives an additional 10% over the reference GTX 660, closing the gap on the reference GTX 660 Ti to just 5% and nothing at all with a high level of MSAA.

Overclocking the reference GTX 660 gives an extra 10%, while overclocking the DirectCU II TOP, which is already factory overclocked, only gives an extra 5%.


Page 23
Upgrade: comparison with older generation solutions

Upgrade: compariston with older generation solutions
Very often, when Nvidia launches a graphics card, it tries to emphasise the upgrade from a particular older model in its marketing communication. This is a way of refocusing debate away from a direct comparison with the competitor Radeons, which has lately been unfavourable for the GeForces, whose price/performance ratio isn't as good. It is also a way of demonstrating why users should make an upgrade at a time when consumers aren’t upgrading as often as before.

For this launch, Nvidia is focusing on upgrades from the 9800 GT. So as to draw attention to the differences between the two cards, when Nvidia was sending out the reference samples for testing, they set a condition: the GeForce 9800 GT had to be tested alongside the GTX 660.

While we don’t like to be told how to organise our reports, it is true that the question of upgrades from older generation cards is only often dealt with indirectly as the limited time available for testing means that we can’t go back over the performance of all that many cards. With several successive delays in the release date of the GeForce GTX 660, we did however have a few days to spare for once. We therefore agreed to Nvidia’s condition but we naturally didn't focus on the 9800 GT. We managed to roll out a wider panel of cards representing the DirectX 10 and early DirectX 11 generations.

This is what all this gives in a few games and will allow you to evaluate the gain that updating your graphics card could give in recent games. Note that for these tests we used the Catalyst 12.11beta and the GeForce drivers 306.23.


Battlefield 3
Bulletstorm
Crysis 2
  [ Standard ]  [ Sorted ]
  [ Standard ]  [ Sorted ]
  [ Standard ]  [ Sorted ]


Page 24
Conclusion

Conclusion
At the end of this report, we come to a an oft-repeated concluseion: the GeForces struggle to match the price/performance ratio given by the Radeons. Nvidia is still able to take advantage of its positive brand image among gamers, justifying this with a few additional options in its drivers, which, it is true, can be very useful (eg. adaptive vertical sync and PhysX acceleration in some games).

Nvidia is targeting the Radeon HD 7850 with the GeForce GTX 660 but is pricing it on a level with the Radeon HD 7870, at least at first. The GTX 660 certainly carries the day against the HD 7850 but is around ten points down on the HD 7870, which makes its positioning delicate, like so many GeForces. There’s no doubt however that the GeForce GTX 660 will attract buyers at €230, if only thanks to Nvidia’s brand image and its drivers. Note also that things look better for the GTX 660 customised cards - we'll come back to this point.


While the GK106 GPU can’t beat the AMD Pitcairn, it does do relatively well but would probably have been more competitive if Nvidia had settled for a symmetric 1.5 GB memory to reduce the price a bit without affecting performance in practice, a choice that would unfortunately have made marketing the card problematic.

The arrival of a 28nm Nvidia GPU, six months after the launch of the GeForce GTX 680, in this price segment is of course excellent news and should make an impact. Indeed the recent readjustment of AMD's pricing bears witness to this: let’s hope that the two manufacturers continue on this track!

Direct comparison with some of the successful DirectX 10 generation cards shows the significant impact that the current generation has had. Updating to a model such as the GTX 660 or a Radeon HD 7800 will make a big difference and allow you to do your gaming at 1080p with recent games at high quality and fluidity levels.

It is also interesting to see how the multi-GPU versions of these cards do as a pair of Radeon HD 7870s or GeForce GTX 660s is priced close to a GTX 680 or Radeon HD 7970. These bi-card systems do better than the big GPUs but at the price of certain technology disadvantages which mean that we still prefer the bigger GPU solutions.


Let’s finish with a word on the customised models, firstly the Asus GTX 660 DirectCU II TOP. Once again Asus has shown the excellent efficiency of its DirectCU II cooling system. After a few hesitations on the format and structure, Asus seems to have come up with the right formula, which is great. Here again, the tests bear this out though we should temper our enthusiasm with respect to the excellent heat and noise levels, as Asus is charging a lot for its solution. Moreover, some small design improvements could be made, such as better access to the PCI Express power connector or the addition of a second connector to aid the 12V source from the mobo, especially when using a factory overclocked TOP model that may well be pushed further.

Finally, it is important to note that in contrast to the lower priced versions, the customised GTX 660s are cheaper than their Radeon HD 7870 equivalents and are therefore just as competitive within some manufacturer ranges. It will be interesting to see how the GTX 660 pricing develops, with the AMD price cut on the Radeon HD 7870s now being felt.


Copyright © 1997-2014 BeHardware. All rights reserved.