Review: Nvidia GeForce GTX 670 - BeHardware
>> Graphics cards
Written by Damien Triolet
Published on May 10, 2012
Introduction, GPU Boost
Following on from the exceptional GeForce GTX 690, Nvidia has started rolling out versions of its Kepler architecture further down the range. The GeForce GTX 670 has thus made its appearance with its rather strange design, reduced energy consumption and levels of performance very close to the GeForce GTX 680. Will it manage to break the energy efficiency record?
GPU Boost: a good or a bad feature?Like the GTX 680, the GeForce GTX 670 is based on the GK104 GPU with no fewer than 3.5 billion transistors engraved at 28 nanometres by TSMC. Here Nvdia has prioritised gaming performance over GPU computing, setting the GTX 670/680 up to challenge the Radeon HD 7900s and their Tahiti GPU with no less than 4.3 billion transistors.
The GK104 and its 3.5 billion transistors.
To take on the Radeon HD 7970, Nvidia had to push its GPU hard on the GeForce GTX 680. It was originally designed for the segment just beneath but with the Radeon HD 7970 coming in with a lower performance than expected, Nvidia didnít have to wait for the big GPU in the family, the GK110, to target top spot.
With energy consumption nicely under control on the GTX 600s, Nvidia has been able to introduce the first GPU turbo, GPU Boost, designed to maximise the GPU clock to make full use of the available thermal envelope. We arenít fully convinced by Nvidiaís approach as GPU Boost is non-deterministic: in contrast to CPUs, it's based on real energy consumption which varies between each GPU sample according to manufacturing quality and the current leakage affecting it. Moreover Nvidia doesnít validate all same-version samples of this GPU (the GK104-400 for the GTX 680 and the GK104-325 for the GTX 670) at the same maximum turbo clock and simply gives a guaranteed maximum clock allowing the GPU to go a good deal higher if it has been validated higher. The problem is that the press rarely receive medium level samples and as a result the performance levels we give are somewhat higher than what you may find with samples from stores.
Nvidia justifies itself by explaining that it aims to allow each individual model its maximum performance and says that while variation in the maximum GPU Boost clock can be significant, the variation in the average GPU Boost clock observed is lower, the reason being that the energy consumption limit stops the GPU from clocking up to a very high level in the more demanding games and also that its temperature limits it.
What Nvidia fails to say is that we as testers also slightly overevaluate performance levels as our testing is carried out under ideal conditions: brief test on a workbench. Although it might make our work more fun, unfortunately we canít play for an hour to heat up the GPU before taking each performance reading! We previously compared the difference in performance between two GeForce GTX 680s. Without looking for worst case performance we observed a theoretical difference of 2% and 1.5% in practice. This isnít enormous but is a nuisance when the difference with the competition is so tight. With the margin for manoeuvre given to GPU Boost increasing to 15% on the GeForce GTX 670, we do see the difference as problematic.
What's the solution? Ideally Nvidia would allow testers to limit cards at the least favourable level with the GPU Boost clock limited to the officially guaranteed value. This isn't the current situation and we have therefore opted to simulate such a sample ourselves, by juggling with the overclocking settings. Thus we are able to give you the guaranteed performance levels as well as the performance you can expect to obtain with a more favourable sample.
Specifications, the reference GeForce GTX 670
While the basic processing power of the GeForce GTX 670 is 20% lower than the GeForce GTX 680, with a sample such as ours that can get up to 1084 MHz with the turbo, the difference drops to a little under 15%. Memory bandwidth is identical for the wo cards which indicates that performances could be very similar.
For this test, Nvidia supplied us with a reference GeForce GTX 670.
The reference GeForce GTX 670
The GeForce GTX 670 is relatively compact at 24 cm in length, with the PCB measuring just 17.2 cm! This gives the GeForce GTX 670 a very original design with a plastic extension behind the PCB to make the card long enough to have a radial, or blower, fan.
With a TDP of 170W and a GPU Boost limit target under this (140W), Nvidia has been able to go for a basic cooling block: an aluminium radiator equipped with a big copper inset at its base. Thereís a second smaller radiator for the sensitive power stage components.
A standard shell covers the card. Overall the finish isnít that high in quality, well below what you'd expect for a high end card on sale at Ä400. Within the space of two weeks, Nvidia has come out on one hand with the exceptional finish of the GeForce GTX 690 and on the other hand with this really low cost design for the GeForce GTX 670...
Nvidia has designed its PCB with four phases to supply current to the GK104-325 as well as two for the Hynix R0C GDDR5 certified at 1.5 GHz. Note that the PCB has been designed for up to 4 GB of memory. The GeForce GTX 670 has two 6-pin power supply connectors, corresponding to a maximum energy consumption of 225W according to PCI Express specifications. In practice energy consumption is a good deal lower than this.
The connectivity is identical to that on the GeForce GTX 680, with two DVI Dual Link connectors, a 3 GHz HDMI 1.4a connector and a DisplayPort 1.2.
The official GPU Boost clock for the GeForce GTX 670 is 980 MHz as against a base clock of 915 MHz. Our test sample model, which is a rather good one, has a max clock of 1084 MHz and other testers tell us that they have seen samples that go up to as high as 1123 MHz. In other words, the maximum clock can be of a similar order as that on a GeForce GTX 680, although offical GPU Boost base specifications are lower than this.
GPU Boost reduces the clock to protect the card if the TDP of 170W is exceeded on the GeForce GTX 670, and increases it from the base spec as long as energy consumption remains under 140W. Itís possible to raise this limit by 22%, up to 170W, using overclocking tools such as EVGAís Precision X.
Noise levels and GPU temperature
NoiseTo observe the noise levels produced by the various solutions, we put the cards in a Cooler Master RC-690 II Advanced casing and measured noise at idle and in load. We used an SSD. All the fans in the casing, as well as the CPU fan, were turned off for the reading. The sonometer was placed 60 cm from the closed casing and ambient noise was measured at +/- 21 dBA. Note that for all the noise and heat readings, we used the real reference design of the Radeon HD 7950, rather than the press card supplied by AMD.
The blower on our GeForce GTX 670 made a rather annoying mechanical noise which naturally impacts on these noise readings, especially at idle. Nvidia told us that it has also observed this problem and that it's linked to a manufacturing issue on some of the cards in the first lots. Weíll update our results as soon as weíre able to test a second test sample.
TemperaturesStill in the same casing, we took a reading of the GPU temperature with the internal sensor:
The GeForce GTX 670 cooling system is pretty effective.
Readings and infrared thermography
Readings and infrared thermographyFor this test, we used the protocol described here.
Firstly, hereís a summary of all the readings:
Thereís not much of a difference at idle between the GTX 680 and 670.
In load however internal temperatures are lower overall on the GTX 670, with the GTX 580 even hotter than the GTX 680. Note that in our in load test, in 3DMark 11, the GK104ís clock varies between 1006 and 1097 MHz on the GTX 680 and between 954 and 1032 MHz on the GTX 670.
Finally, hereís what the thermal imaging shows:
These images show that the GeForce GTX 670 is well cooled, even though the power stage heats up somewhat.
Energy consumption and performance/watt
Energy consumptionWe used the test protocol that allows us to measure the energy consumption of the graphics card alone. We took these readings at idle on the Windows 7 desktop as well as with the screen in standby so as to observe the impact of ZeroCore Power. In load, we took our readings in Anno 2070, at 1080p with all the settings pushed to their max, and in Battlefield 3, at 1080p in High quality mode:
The GeForce GTX 670 consumes a good deal less than the GeForce GTX 680, in spite of what you might think from the relatively close specifications of the two cards. In demanding tests its energy consumption is similar to that of the Radeon HD 7870 but it consumes more than the HD 7870 in less demanding tests as GPU Boost maximises use of the available thermal envelope.
AMD retains the advantage in the standby screen thanks to the ZeroCore Power technology
We have shown the energy consumption readings graphically, with fps per 100W to make the data more legible:
[ Anno 2070 1080p Max ] [ Battlefield 3 1080p High ]
Thanks to the 28nm fabrication process and the revised GK104 architecture, the GTX 670 gives a significantly improved energy yield. The GeForce GTX 680 has a slightly better yield than the Radeon HD 7970 in Anno 2070 and is 15% up on it in Battlefield 3 but the Radeon HD 7870 is still the most efficient.
Nvidia has however made a good deal of progress and, without pushing the GK104 to its limit, the efficiency of the GeForce GTX 670 has improved, with a 25% advantage over the Radeon HD 7950 in Anno 2070 and 30% in Battlefield 3.
Note however that each game represents a particular case and that the actual yield varies from one card sample to the next, on the Radeons because their energy consumption varies and the GeForces because their maximum clock and therefore their performance levels vary. Here, the GeForce GTX 680 clocked as high as 1110 MHz and the GeForce GTX 670 1084 MHz.
Theoretical performance: pixels
Note that for all the theoretical performance readings, the GeForce GTX 680 and 670 were running at their respective maximum GPU clocks, namely 1110 MHz for the first and 1084 MHz for the second.
Texturing performanceWe measured performance during access to textures of different formats in bilinear filtering: for standard 32-bit (4xINT8), 64-bit ďHDRĒ (4x FP16), 128-bit (4x FP32) and 32-bit RGB9E5, an HDR format introduced with DirectX 10 which enables to store 32-bit HDR textures with a few compromises.
The GeForce GTXs can filter FP16 textures at full speed in contrast to the Radeons which up until now made up for this with such superior filtering power that even though they had to filter FP16 textures at half speed, they were able to post similar speeds to the GeForces. This is no longer so with the GeForce GTX 600s which have a considerable lead here.
However, while the GeForce GTX 680 uses GPU Boost in our test to run at 1110 MHz and post a theoretical throughput of 142 Gtexels/s, it struggles to reach this in practice and posts a throughput 25% lower than this. This pattern is the same with the GeForce GTX 670 but to a lesser extent, with a score 16% down on its maximum theoretical throughput.
Note that we had to increase the Powertune energy consumption limit on the Radeon HD 6900s and the Radeon HD 7700s / 7800s to a maximum as otherwise the clocks were reduced in this test. At default the Radeons therefore seem incapable of fully benefitting from their texturing power! Note that this is not the case for the Radeon HD 7900s. We have highlighted the proportion of the performance that can only be obtained by modifying PowerTune.
FillrateWe measured the fillrate without and then with blending, and this with different data formats:
In terms of fillrate, the GeForce GTX 600s and the GK104 GPU are finally able to process FP10/11 and RGB9E5 formats at full speed, while blending of these formats is still at half speed. While both the GeForces and the Radeons can process the single channel FP32 format at full speed without blending, only the Radeons maintain this speed with blending. Moreover, theyíre significantly faster with FP32 quad channel (HDR 128 bits)
Although the Radeon 7800s have the same number of ROPs as the Radeon HD 7900s, their lower memory bandwidth means they canít use them to a maximum with blending or with FP16 and FP32 without blending.
Theoretical performance: geometry
Note that for all the theoretical performance readings, the GeForce GTX 680 and 670 were running at their respective maximum GPU clocks, namely 1110 MHz for the first and 1084 MHz for the second.
Triangle throughputGiven the architectural differences between the various GPUs in terms of geometry processing, we obviously wanted to take a closer look at the subject. First of all we looked at triangle throughput in two different situations: when all triangles are drawn and when triangles are skipped with back face culling (because they arenít facing the camera):
The GeForce GTX 600s donít do any better than the Radeon HD 7900s when rendering triangles, though their level of performance has probably been deliberately reduced Ė Nvidia was already cutting back the GTX 500s to differentiate the Quadros from the GeForces.
When the triangles can be removed from the rendering, the GeForce GTX 600s take full advantage of their capacity to process 4 and 3.5 triangles per cycle to push home their advantage.
Next we carried out a similar test using tessellation:
With the GeForce GTX 600, Nvidia has reaffirmed its superiority when it comes to processing a lot of small triangles generated by a high level of tessellation. The Radeon HD 7900s are on a par with the Radeon HD 7800s, which have the same number of fixed units dedicated to the task.
The architecture of the Radeons means that they can be overloaded by the quantity of data generated, which then drastically reduces their speed. Doubling the size of the buffer dedicated to the GPU tessellation unit in the Radeon HD 6800s meant they gave significantly higher performance than the Radeon HD 5000s. AMD has continued down this line with the Radeon HD 7000s.
For unknown reasons, the GeForce GTX 570 gives better performance here than the GeForce GTX 580, which may also suffer from overload, though it is possible that this is linked to a geometric profile in the drivers.
Test protocolFor this test, we used the protocol introduced for the report on the GeForce GTX 680 which includes some new games: Alan Wake, Anno 2070, Batman Arkham City, Battlefield 3, F1 2011 and Total War Shogun 2. We also added The Witcher 2 Enhanced Edition.
We have decided no longer to use the level of MSAA (4x and 8x) as the main criteria for segmenting our results. Many games with deferred rendering offer other forms of antialiasing, the most common being FXAA, developed by Nvidia. Thereís therefore no point in drawing up an index based on a certain antialiasing level, which in the past allowed us to judge MSAA efficiency, which can vary according to the implementation. At 1920x1080 we therefore carried out the tests with two different quality levels: extreme and very high, which automatically includes a minimum of antialiasing (either MSAA 4x or FXAA/MLAA/AAA).
Also we no longer show decimals in game performance results so as to make the graph more readable. We nevertheless note these values and use them when calculating the index. If youíre observant youíll notice that the size of the bars also reflects this.
All the Radeons were tested with the Catalyst 12.4 drivers and all the GeForces were tested with the 301.33 drivers.
We added a Radeon HD 5870 as a reference as more and more users of these first DirectX 11 graphics cards will probably be looking at an update.
As we explained in the introduction, the GeForce GTX 680 and 670 were on the one hand tested as they come, which is to say with the specificities of our sample in terms of maximum turbo clock (GTX 680 1110 MHz and GTX 670 1084 MHz) and on the other at their minimum guaranteed specs (GTX 680 and GTX 670). To do this we played with the overclocking settings to reduce the base clock by slightly adjusting the energy consumption limit so that the clock in practice would correspond to that of a card with a maximum turbo clock equal to that of the official GPU Boost clock. Note that this isnít the same as turning GPU Boost off!
To recap, we took the opportunity of the report on the GeForce GTX 690 to introduce the X79 platform and a Core i7 3960X into our test system so as to benefit from PCI Express 3.0. Note that the activation of PCI Express 3.0 isnĎt automatic on the GeForce GTX 600s and requires a registry modification, which we of course effected and which gives an average gain of +/- 1%.
Test configurationIntel Core i7 3960X (HT off, Turbo 1/2/3/4/6 cores: 4 GHz)
Asus P9X79 WS
8 GB DDR3 2133 G.Skill
Windows 7 64 bits
GeForce beta 301.11 drivers
Benchmark: Alan Wake
Alan Wake is a pretty well executed title ported from console and and based on DirectX 9.
We used the gameís High quality levels and added a maximum quality level with 8x MSAA and 16x anisotropic filtering. We carried out a well defined movement and measured performance with Fraps. The game is updated via Steam.
The Radeon HD 7000s do pretty well in this game in which they easily outperform the GeForces, with the GTX 600s suffering particularly at MSAA 8x.
Benchmark: Anno 2070
Anno 2070 uses a development of the Anno 1404 engine which includes DirectX 11 support.
We used the very high quality mode on offer in the game and then, at 1920x1080, we pushed anistropic filtering and post processing to a max to make them very resource hungry. We carried out a movement on a map and measured performance with fraps.
Here the GeForce GTX 670 gives either a little more or a little less than the Radeon HD 7950, depending on the level of detail.
Benchmark: Batman Arkham City
Batman Arkham City
Batman Arkham City was developed with a recent version of Unreal Engine 3 which supports DirectX 11. Although this mode suffered a major bug in the original version of the game, a patch (1.1) has corrected this. We used the game benchmark.
All the options were pushed to a maximum, including tessellation which was pushed to extreme on part of the scenes tested. We measured performance in Extreme mode (which includes the additional DirectX 11 effects) with MSAA 4x and MSAA 8x. The game is updated via Steam.
With the Catalyst 12.4s, AMD has finally, after four months, corrected a bug which affected the performance of the Radeons with MSAA in this game, allowing the HD 7970 to overtake the GTX 680 in 8x mode. However the GeForce GTX 670 remains ahead of the Radeon HD 7950.
Benchmark: Battlefield 3
Battlefield 3 runs on Frosbite 2, probably the most advanced graphics engine currently on the market. A deferred rendering engine, it supports tessellation and calculates lighting via a compute shader.
We tested High and Normal modes and measured performance with Fraps, on a well-defined route. The game is updated via Origin.
The GeForce GTX 600s do particularly well in Battlefield 3 where the GTX 670 is on a par with the Radeon HD 7970, or even a bit ahead if you have a good model.
Although only in DirectX 9 mode, the rendering is pretty nice, based on version 3.5 of Unreal Engine.
All the graphics options were pushed to a max (high) and we measured performance with Fraps, with MSAA 4x and then 8x.
With MSAA 8x, the GeForce GTX 680 suffers in comparison to the Radeon HD 7970. The GeForce GTX 670 is also down on the Radeon HD 7950, but the gap is much smaller.
Benchmark: Civilization V
Pretty successful visually, Civilization V uses DirectX 11 to improve quality and optimise performance in the rendering of terrains (thanks to tessellation) and to implement a special compression of textures (thanks to the compute shaders), a compression which allows it to keep the scenes of all the leaders in the memory. This second usage of DirectX 11 doesnít concern us here however as we used the benchmark included on a game card. We zoom in slightly so as to reduce the CPU limitation which has a strong impact in this game.
All settings were pushed to a max and we measured performance with shadows and reflections. The game is updated via Steam.
Although the performance issue the Radeon HD 7000s have been suffering from in this game has been corrected, the GeForce GTX 600s do even better, benefitting among other things from the new 300 series drivers, which bring a significant gain here.
Benchmark: Crysis 2
Crysis 2 uses a development of the Crysis Warhead engine optimised for efficiency but adds DirectX 11 support via a patch and this can be quite demanding. As, for example, with tessellation, implemented abusively in collaboration with NVIDIA with the aim of causing Radeon performance to plummet. We have already exposed this issue here.
We measured performance with Fraps on version 1.9 of the game.
The GeForce GTX 680 is hot on the heels of the Radeon HD 7970 here, while the GeForce GTX 670 is a little further ahead of the Radeon HD 7950. A good model of the GTX 670, such as the one we tested that can clock up to 1084 MHz, will be able to compete with the Radeon HD 7970.
Benchmark: F1 2011
The latest Codemaster title, F1 2011, uses a slight development of the F1 2010 and DiRT 3 engine, which retains DirectX 11 support.
We pushed all the graphics options to a max and we used the gameís own test tool on the Spa-Rancorchamps circuit with a single F1.
The GeForce GTX 600s do particularly well in this game when MSAA 8x isn't on, at least the GTX 680 does. It suffers more from a lack of memory bandwidth than the GTX 670.
Benchmark: Metro 2033
Still one of the most demanding titles, Metro 2033 forces all recent graphics cards to their knees. It supports GPU PhysX but only for the generation of particles during impacts, a rather discreet effect that we therefore didnít activate during the tests. In DirectX 11 mode, performance is identical to DirectX 10 mode but with two additional options: tessellation for characters and a very advanced, very demanding depth of field feature.
We tested it in DirectX 11, at maximum quality (including DoF and MSAA 4x), very high quality and with tessellation on.
No mono-GPU card allows you to play Metro 2033 comfortably at maximum quality. The GeForce GTX 600s suffer from a lack of memory bandwidth in this mode, which limits their performance.
Benchmark: The Witcher 2 Enhanced Edition
The Witcher 2 Enhanced Edition
The Witcher 2 graphics engine has been worked on gradually over time to give us the current version in the recent Enhanced Edition. Although itís based on DirectX 9, it's relatively demanding once all the graphics options are pushed to a maximum, one of these being particularly demanding: UberSampling, a 4x supersampling type of antialiasing with a few optimisations.
We tested the game at maximum quality with and without UberSampling. Performance was measured with Fraps.
The GeForce GTX 670 is on a par with the Radeon HD 7950 while the Radeon HD 7970 is slightly in front of the GeForce GTX 680 but only without UberSampling.
Benchmark: Total War Shogun 2
Total War Shogun 2
Total War Shogun 2 has a DirectX 11 patch, developed in collaboration with AMD. Among other things, it gives tessellation support and a higher quality depth of field effect.
We tested it in DirectX 11 mode, with a maximum quality, MSAA 4x and MLAA. The game is updated via Steam.
Since the update made at the end of March, the performance of the GeForce 600s has dropped right off, with certain specific optimisations no doubt no longer working. While waiting for Nvidia to correct this, the Radeons have the lead with MSAA on.
Performance recapAlthough individual game results are obviously worth looking at when you want to gauge performance in a specific game, we have also calculated a performance index based on all tests with the same weight for each game. We set an index of 100 to the GeForce GTX 580:
Hold the mouse over the graph to classify the cards by performance at 1920x1080.
On average, the GeForce GTX 670 is 4% up on the Radeon HD 7950, a lead which climbs to 9% with a particularly good sample model such as ours where the GPU is able to go up to 1084 MHz, as against an official GPU Boost clock of 980 MHz.
The GeForce GTX 670 is only 9% down on the GeForce GTX 680 and this deficit falls to 6% when we take into account the maximum turbo clock of the two samples tested.
Note that the trend between the GeForce GTX 680 and Radeon HD 7970 has been reversed here, in favour of the Radeon, principally for two reasons:
- AMD has at last corrected (after 5 months!) a performance problem in Batman AC with MSAA on.
- Nvidiaís cards suffer from a performance issue in the latest version of Total War Shogun 2.
GPU Boost performance and overclocking
Hold the mouse over the graph to view results in fps.
GPU Boost performance and overclockingWe wanted to observe the gains given in practice by GPU Boost, which we managed to neutralise and limit to its official clock of 980 MHz. In practice our sample model of the GeForce GTX 670 was limited to 1084 MHz.
Moreover we were able to overclock the GeForce GTX 670 by increasing the GPU clock by 130 MHz (thus it varies between 1045 MHz and 1214 MHz) and the memory clock by 150 MHz, which therefore increases to 1653 MHz. The energy consumption limit was then pushed to its maximum, +22% or 170W.
We have also included the results of a GeForce GTX 680 with a maximum GPU Boost clock near the top of the pile: 1110 MHz.
These results were obtained at 1920x1080 at an extreme and very high quality level:
Hold the mouse over the graph to display the results in fps.
Turning GPU Boost off means the GeForce GTX 670 loses an average of 3% on its official clock of 980 MHz. In practice GPU Boost could take our sample a good deal higher than this clock, giving a total gain of between 6 and 12% depending on the game, with an average of 9%.
On top of this, overclocking the GeForce GTX 670 gave us an additional gain of between 8 and 17%, a very high score made possible by the increase in the energy consumption limit. On average the gain is 11% and this easily puts our GTX 670 in front of a good, though not overclocked, GeForce GTX 680.
ConclusionThis GeForce GTX 670 has a very interesting potential and is probably the product in the Kepler family most worth a look. Depending on the quality of the model with respect to GPU Boost, it can give a level of performance very close to that of the GeForce GTX 680. It would then benefit from a high clock and an identical memory bandwidth, reducing the impact of the deactivation of some of the execution units.
On sale at Ä400, against Ä500 for the GTX 680, it has a better price/performance ratio. This isnít all however. Its main advantage lies in its energy efficiency which is also significantly improved and makes this GeForce GTX 670 a new reference in this area, dethroning the Radeon HD 7870.
If we mentioned the GTX 670ís potential further up, this is of course because the reference card isnít perfect. Its pricing could be a little more aggressive and, above all, the reference design is a long way off matching up to the price level. Even if we put the fan issues we experienced on our sample to one side, the design is very cheap looking. In the space of a few weeks, Nvidia has brought out this GTX 670 with a sub-standard finish and the GeForce GTX 690 with an exceptional one. For a high-end card, we would expect more than Nvidia has given us here.
The good news is that the various graphics card partners are likely to respond by offering better finished customised versions, even adding a power stage that is better set up for overclocking. With relatively low energy consumption and the right amount of effort, they wonít have any difficulty in offering cooling systems that are just as effective as this GPU is!
Copyright © 1997-2013 BeHardware. All rights reserved.