Report: Nvidia GeForce GTX 460 - BeHardware
>> Graphics cards
Written by Damien Triolet
Published on July 12, 2010
With a new GPU and a tweaked architecture designed for the €200-€250 segment, NVIDIA is counting on attracting the mass gamer market. At last, the GF104 is with us, in the Geforce GTX 460 card, or rather the GeForce GTX 460 cards that we have run through their paces both in technical terms and performance in practice.
The GF104The classic cycle of introduction of a new generation of GPUs consists in the arrival of a high-end version followed by several derivatives based on the same architecture. This is a cycle which AMD executed particularly well with its most recent generation of GPUs, while NVIDIA partly failed with the GeForce 200s. NVIDIA invested heavily in its GPU computing architectures and was definitely caught by surprise by the effectiveness of the latest Radeons. Ever since the Radeon HD 4800s, AMD has had the advantage in terms of price/performance ratios, particularly in the segment where gamers want a high-performance GPU that won’t ruin them.
NVIDIA’s lag on DirectX 11 GPUs, as well as limited AMD production has meant that a relatively high price has been maintained for the Radeon HD 5850, leaving the very important €200-250 segment empty. AMD tried to increase its offer with the Radeon HD 5830 but its lack of value meant it was sidelined until its price was cut successively from €240 to €220, then €200 and finally €170/180 in reaction to the launch of these GeForce GTX 460s as well as to get rid of unsold inventory. The flagship segment for gamers is therefore still completely open and NVIDIA is counting on taking possession of the terrain.
Even in a cut down version, the GF100 is still too expensive and not efficient enough to answer the mass demands of the segment. This meant a new GPU was called for to fill in just under the high-end and a notch above the usual mid-end, halving high-end performance levels. This is precisely the GPU that NVIDIA has come up with, by moving on with a new architecture. This new architecture includes modifications of the same order as those put into place between the G8xs/G9xs and the GT200, at the same time as looking to put out a very efficient gamer product, along the lines of the GeForce 8800 GT. From this process, the GF104 was born, equipped with 1.95 billion transistors. Lets take a look.
Fermi architecture for gamers
Fermi architecture for gamersTo recap, the GF100 is based primarily on a big structure called the GPC (Graphic Processing Cluster). 4 in number, these GPCs each include a rasterizing unit and 4 SMs (Streaming Multiprocessors). There are six 64-bit memory controllers forming a 384-bit bus to feed these GPCs. For the GF104, NVIDIA has gone for half a GF-100 with a 256-bit interface. It is, then, based on 2 GPCs each of 4 SMs. Up to here, pretty standard then.
Hold the mouse over the diagram of the GF104 to compare it to the GF100.
Taking a closer look, the SMs on the GF104 seem bigger. And indeed they are. In the GF100, each SM has 32 “cores” and 4 texturing units. Looking in more detail, there are 2 schedulers which supply 5 execution blocks:
- 16-way SIMD0 (the “cores”): 16 FMA FP32
- 16-way SIMD1 (the “cores”): 16 FMA FP32
- 4-way SFU unit: 4 FP32 special functions or 8 interpolations
- 16-way 32-bit Load/Store unit
- 4-way texturing unit
For the GF104, NVIDIA wanted to add some execution units at lower cost and increase the ratio of texturing units to processing units. The SMs have therefore been enlarged to 48 “cores” and 8 texturing units. This is a ratio that directly targets gaming yield. In more detail, the GF104’s SMs have 2 dual instruction schedulers which supply 6 execution blocks:
- 16-way SIMD0 (the “cores”): 16 FMA FP32
- 16-way SIMD1 (the “cores”): 16 FMA FP32
- 16-way SIMD2 (the “cores”): 16 FMA FP32
- 8-way SFU unit: 8 FP32 special functions or 16 interpolations
- 16-way 32-bit Load/Store unit
- 8-way texturing unit
Hold the mouse over the GF104 diagram to compare with the GF100.
The half-GF100 is thus transformed into a much faster GPU with just a 25% deficit in terms of main processing units overall and an identical number of texturing units and execution units for special functions. Moreover, the texturing units have been improved to filter FP16 textures (as well as FP11, FP10 and RGB9E5) at full speed. Double precision performances however are cut a great deal (it can't compute double precision at half-speed), which is anyway also the case with the consumer versions of the GF100.
While the GF104 can send 4 instructions per SM and per cycle against only 2 for the GF100, we’re talking about 2 dual instruction schedulers and not 4 schedulers. The difference is a subtle one but marks a paradigm shift at NVIDIA. From the G80 to the GF100, all NVIDIA GPUs have naturally had optimal yield thanks to scalar type processing of the executed programme. This is in opposition to Radeon GPUs which use vector units which are of course less efficient.
Although each SM on the GF100 sends 2 instructions per cycle, they are executed on two different groups of data, warps of 32 threads. This means there’s never any dependence problem and yield is optimal as 2 instructions can mostly always be scheduled. This has changed with the GF104, on which each of the two schedulers can send two instructions per warp, so as to supply the additional units. These instructions cannot of course be interdependent.
NVIDIA prefers to talk superscalar rather than 2D vector architecture, given that each scheduler can send any combination of independent instructions. To keep yield high, the driver compiler has been tweaked and will try to organise the code so as to adapt to this particularity. This means the problematic is similar to the one you get with AMD GPUs, although on a quite different scale. If all instructions are scalar and dependent, the yield and raw processing power of the Radeons falls to 20%, while it can only fall to 66% at worst on the GF104. And the best case scenario is equal yield to the GF100 with half the number of SMs!
For the rest, the GF104’s SMs retain the same number of registers, the same 64 KB L1 cache memory and shared memory (16/48 KB or 48/16 KB) and the Polymorph Engine which handles a share of geometric operations such as vertex fetch, culling and tessellation. You still have the unified L2 cache linked to the memory controllers, but now 512 KB instead of 768 KB.
The architecture as a whole is therefore optimised to give the best results in current games. There is however a major limitation when it comes to the fillrate…
Architecture (cont), specifications
More triangles, fewer pixelsSince the launch of the GeForce GTX 480, we’ve had time to look at some of the details of the architecture. First in terms of geometry where we’ve been able to make one or two observations, though some aspects of the architecture still hold a few secrets in store.
Culling is very fast on both the GF100 and the GF104. This operation consists for example in rejecting any triangles with their backs to the camera (back face culling), as it means they are then invisible. Imagine a sphere made up of multiple sides displayed on the screen. You can in fact only see half the polygons, those which face you (the camera). The other half aren’t visible and can therefore be removed from the rendering in many cases. For more complex objects and characters displayed in a game, the idea is the same and statistically half of those polygons which make them up can be rejected. Being able to do so rapidly is therefore advantageous. While AMD GPUs as well as the GT200 and predecessors can cull one triangle per cycle, this operation is now distributed and parallelized in the GF100 and GF104. It’s carried out in the Polymorph Engine of each SM. We observed that each SM can cull one triangle every 4 cycles (GPU clock). With 16 SMs, the GF100 can therefore cull 4 per cycle and with 8 SMs, the GF104 can cull 2 per cycle.
In terms of rasterization, when going from triangles to pixels, the GT200 and predecessors could process one triangle every 2 cycles, while the Radeons were much faster with 1 per cycle. Here again, the GF100 parallelized this stage of the rendering process, with each GPC having its own raster engine. In theory, each of these can process one triangle per cycle and generate 8 pixels per cycle. In practice, while they can generate 8 pixels per cycle, triangle throughput is artificially limited on the GeForces so as to give the Quadros an advantage here. The tests we carried out made us think they can in practice process one triangle every two cycles. So while the GF100 was in practice roughly twice as fast as the Radeons, the GF104 equals them.
We were also able to gain a better understanding of fillrate limitation on the GF100 and the GF104. The limitation comes from a datapath bottleneck between the SMs and ROPs – 64 bits per cycle. In classic 32-bit rendering then, each SM can generate 2 pixels per cycle or globally 32 for the GF100 and 16 for the GF104. Little matter that the GF100 has 48 ROPs and the GF104 32 ROPs, the limitation will be 32 and 16 pixels per cycle. Moreover, the rasterizers match this limitation. You might think that as the ROPs are slower when it comes to FP16 (64 bits per pixel) and FP32 (128 bits per pixel), they would then all come in use. This isn’t the case however as, once again, the datapath between the SMs and ROPs is limited to 64 bits.
We don’t know if this is a deliberate architecture limitation decided by NVIDIA, a compromise made along the way or the result of a makeshift solution to some problem. Whatever the reason, we do see it as a significant architecture limitation which prevents the card from benefiting from all ROPs and to a lesser extent from all the available memory bandwidth. It’s some consolation that the unused ROPs do kick in when antialiasing is used as antialiasing increases the ROP load without increasing the quantity of data to be transmitted between the SMs and ROPs too much.
SpecificationsJust like for the GeForce GTX 400s based on the GF100, the GeForce GTX 460s use a partially cut down GF104. One of the 8 SMs is disabled. Moreover, and this is regrettable, NVIDIA has decided to launch a 192-bit version in addition to the 256-bit version, both under the same name, the GeForce GTX 460. While the 256-bit version has 1 GB of memory and 32 ROPs, the 192-bit only has 768 MB of memory and 24 ROPs, in addition to the memory bandwidth that has been cut by 25%. As far as we’re concerned, these are two different products. By giving them the same name, NVIDIA is needlessly adding to the confusion. Of course you can differentiate between them by identifying the quantity of memory, but what will happen when a 192-bit version with 1.5 GB comes onto the market? It won’t be all that easy for buyers to understand that it will give lower performance than the 1 GB card…
Theoretical tests - pixels
Texturing performanceWe measured performance during access to textures of different formats in bilinear filtering. Here are the results for standard 32-bit (4x INT8), 64-bit “HDR” (4x FP16) and 128-bit (4x FP32). We have also added performance for 32-bit RGB9E5, a new HDR format introduced by DirectX 10 which allows 32-bit HDR textures to be stocked, give or take a few compromises.
Although NVIDIA is able to reach a yield close to the maximum with the GF100, it isn’t happening with the GF104 which only gives 88% of texturing throughput here. It can however process all formats up to FP16 at full speed, in contrast to the other GPUs tested here.
What this test doesn’t show is that without filtering, the GF100 texturing units can, like the GF104’s, deliver 64 bits of data each.
FillrateWe measured the fillrate without and with blending, with different data formats:
When it comes to the fillrate, the Radeon HD 5000s have a big advantage over the GeForce GTX 400s, especially for FP10, a format processed at full speed by the Radeons but only half speed by the GeForces. Given the datapath limitation between the SMs and ROPs on the GeForce GTX 400s, it’s a shame that NVIDIA didn’t provide better support for FP10 and FP11 formats and can't pack them into 32 bits.
The GeForces still retain certain advantages however. First of all they can do full-speed FP32 single channel without blending. With blending they conserve maximum efficiency for INT8, in contrast to the Radeons.
Theoretical tests - geometry
Triangle throughputGiven the advances NVIDIA has made in terms of geometry processing, we obviously wanted to take a closer look at the subject. First of all we looked at triangle throughput in three different situations: when all triangles are drawn, when half the triangles are removed with back face culling (because they aren’t facing the camera) and when they're all removed:
The GeForce GTX 480 is very fast here and goes over one triangle per cycle. In terms of rejecting triangles via culling, no other GPU gets anywhere near it. The GeForce GTX 460 is close to one triangle drawn per cycle and is also very fast when it comes to removing triangles via culling.
We are, however, quite some distance from the theoretical maximums of 4 triangles per cycle for the GF100 and 2 triangles per cycle for the GF104. Something is limiting them but we don’t exactly know what. We do know however that this limitation doesn’t exist on the GF100 Quadro derivatives.
We then carried out a similar test but this time using tessellation. This test tool hasn’t yet been finalised and fully optimised to give the best yields. It can however already be used to compare the solutions amongst themselves:
The advantage of the GeForces over the Radeons is there for all to see. The Radeons seem to be limited to 1 triangle every 3 cycles when tessellation is used. AMD told us that this wasn’t always the case and that the tessellation unit was capable of outputting one triangle per cycle. This is something we haven’t yet managed to reproduce as the Radeons are very quickly left behind when too many triangles are generated. Note however that at 270 million triangles per second, you can already envisage some pretty complex scenes with the Radeons!
AMD and NVIDIA have very different approaches. While the Radeons all give identical performance here, the GeForces vary card by card. We have also noted enormous gains with the GeForces when the GPUs have to load several vertices per primitive. The GF100 and the GF104 continue to run at full speed when loading 2 or 3 while other GPUs see their speeds go into freefall here because they can only load one vertex per clock.
Displacement mappingWe tested tessellation with an AMD demo that is part of Microsoft’s DirectX SDK. This demo allows us to compare bump mapping, parallax occlusion mapping (the most advanced bump mapping technique used in gaming) and displacement mapping that uses tessellation.
Basic bump mapping.
Parallax occlusion mapping.
Displacement mapping with adaptive tessellation.
By creating true additional geometry, displacement mapping displays clearly superior quality. Here we activated the adaptive algorithm that allows you to avoid generation of useless geometry and too many small triangles that will not fill any quads and waste a lot of ressources.
We also measured performances obtained with the different techniques:
It is interesting to note that tessellation doesn’t only improve rendering quality but also performance! Parallax occlusion mapping is in fact very ressource heavy as it uses a complex algorithm that attempts to simulate geometry realistically. Unfortunately it generates a lot of aliasing and this is noticeable on the edges of objects or surfaces that use it.
Note however that in the present case the displacement mapping algorithm is helped by the fact that it is dealing with a flat surface. If it has to smooth geometry contours and apply displacement mapping at the same time the demands are of course much higher.
The GeForce GTX 400s do much better with tessellation load here than the Radeon HD 5000s. With extreme tessellation levels, the GeForce GTX 460 is almost twice as fast as the Radeon HD 5870 in this test. The use of an adaptive algorithm which regulates the level of tessellation acording to the areas that are more or less detailed, depending on distance or screen resolution gives significant gains across the board and is more representative of what developers will put into place. The gap between the GeForces and the Radeons is then reduced, but the GeForce GTX 400s retain a significant advantage.
For this test, we looked at several models from NVIDIA’s partners:
Gainward GeForce GTX 460 Golden Sample 1 GBGainward already has an entirely customized solution, both in terms of the PCB and cooling system:
A particularity of the Gainward Golden Sample, in addition to slight GPU overclocking, up from 675 to 700 MHz, is that it provides for a better video connectivity directly on the card itself: 2 DVIs, a VGA and an HDMI. This means Gainward doesn’t need to include any adaptors in the bundle.
The personalised PCB is relatively short at just 18.5 cm and the 2 PCI Express connectors have been placed above it. The double slot cooling system is based on a relatively small and light heatsink, with a copper base from which extend 2 heatpipes. There’s no heatsink for the memory chips and power stage, which make do with exposure to air under the chassis. While we’re on the subject of the memory chips, they’re Samsung GDDR5 certified at 1 GHz.
MSI GeForce GTX 460 Cyclone 768D5 OC EditionMSI is currently marketing cards based on the stock model but also cards with the same PCB but a Cyclone series cooler. This Cyclone model exists in an overclocked version, like the one we have tested. The GPU is clocked at 728 MHz instead of 675 MHz The memory clock remains at 900 MHz (1800 MHz for data).
Note that while the PCB is indeed the reference PCB, it’s manufactured by MSI. MSI says that it has used military class components, a major selling point. Although the components do look to be better quality than on the Gainward, whatever advantage they bring remains to be seen, especially as they seem similar to those used on the other cards based on the reference PCB…
MSI delivers its card with a DVI to VGA adaptor and a DVI to HDMI, which it has gone for rather than a mini-HDMI to HDMI (probably more expensive).
The cooler used by MSI is relatively similar to the stock cooler, except that there is no chassis on the card. A larger model cooler is however used and comes with a 9 cm rather than 8 cm fan. Just like on the Gainward, it has Samsung GDDR5 memory certified at 1 GHz.
The model that we tested has been announced for immediate availability at €209.
Twintech GeForce GTX 460Twintech is marketing a stock GeForce GTX 460 768 MB card and a 1 GB card and is going to try and align itself according to the NVIDIA pricing of €200 and €230 respectively. Twintech have told us however that the 1 GB version will be sold with a customized fan, though based on the same reference PCB.
The radial stock cooler has a copper base from which extend 2 heatpipes. It’s very similar to the one of the MSI Cyclone models but smaller. The chassis allows some air to be directed out of the casing. Only some however given the central position of the fan.
The 768 MB version uses the same PCB as the 1 GB version. It is 2 memory chips (which can be situated in different places according to the block disabled by NVIDIA) down given the reduced bus size (from 256 bits to 192 bits). Note that the 256 bit cards have a GF104-325 while the 192-bit cards have a GF104-300.
Like the other GeForce GTX 460s that we’ve seen, they are equipped with Samsung GDDR5 certified at 1 GHz.
Energy consumption, noise pollution
Energy consumptionWe did of course use our new test protocol that allows us to measure the energy consumption of the graphics card alone. We took these readings at idle, in 3D Mark 06 and Furmark.
With a TDP announced at 150 watts for the GeForce GTX 460 768 MB and 160 watts for the 1 GB version, NVIDIA has greatly improved energy consumption on its latest GPU. The slight difference between the 2 models isn’t really significant as it is similar to the variation that can exist between two cards of a same model. In load the GeForce GTX 460 has similar energy consumption to the Radeon HD 5850 with a small advantage for the MSI model.
At idle, things are even better, at just 15/16 watts, a new record in this segment.
NoiseWe place the cards in an Antec Sonata 3 casing and measure noise levels at idle and in load. We placed the sonometre 60 cm from the casing.
The stock model is relatively quiet but does make a little more noise than the stock Radeon HD 5850, which is particularly quiet.
The MSI GeForce GTX 460 Cyclone 768D5 OC Edition is very close to the stock model in spite of being overclocked by a little over 50 MHz. The Gainward GeForce GTX 460 Golden Sample does less well here however. While at idle it is only slightly less silent, in load it can be clearly heard, more than the other cards, with sounds that get ever more annoying.
TemperaturesStill in the same casing, we took a temperature reading of the GPU by internal sensors:
As you can see, the GeForce GTX 460s are well cooled. At idle the GPU hovers under the 30 °C mark, while the temperature in the room was between 26 and 27 °C. In load the temperatures remain relatively low with the MSIs having an advantage due to their more effective cooling system.
Here’s what the infrared thermographic imaging shows:
Gainward GeForce GTX 460 Golden Sample 1 GB at idle
MSI GeForce GTX 460 Cyclone 768D5 OC Edition at idle
GeForce GTX 460 1 GB at idle
Gainward GeForce GTX 460 Golden Sample 1 GB
MSI GeForce GTX 460 Cyclone 768D5 OC Edition in load
GeForce GTX 460 1 GB in load
You can find more detail in our report on the thermal characteristics of graphics cards which has been updated to include the results for the GeForce GTX 460.
Overclocking, test protocol
OverclockingNvidia has announced relatively high overclocking capacity for the GF104 and says that 800 MHz, up from 675 MHz, isn’t too much of a challenge. We naturally wanted to test this out. Here’s what we managed. We didn’t try and push the GDDR5 beyond 1000 MHz and we tested stability by steps of 25 MHz in Furmark:
Gainward GeForce GTX 460 Golden Sample 1 GB: 850 MHz
MSI GeForce GTX 460 Cyclone 768D5 OC Edition: 825 MHz
GeForce GTX 460 1 GB: 850 MHz
GeForce GTX 460 768 MB: 800 MHz
The 4 cards in our possession all managed 800 MHz for the GPU and therefore 1600 MHz for the processing units. Two even managed 850 MHz (1700 MHz for the processing units), which represents a very nice 26% gain.
Seeing that the GF104 overclocks so well, why have NVIDIA clocked it at just 675 MHz? It’s difficult to say and is probably for a combination of reasons. The GF104 is perhaps sufficiently stable for gaming but not stable enough to be validated at this clock. Maybe only some of the production manages to hold such clocks and have therefore been used for the first lots out that, naturally, are those that are tested in the press. NVIDIA may have wanted to avoid competing too much with the GF100, which there are still significant stocks to get rid of as GeForce GTX 470 sales haven’t been as strong as expected. If this is the case, it’s not too much of a stretch to imagine a GeForce GTX 475 based on a full GF104 pushed up to 800 MHz. The 150 watt TDP at 675 MHz is also important as many OEMs plan their systems with such a thermal envelope.
Whatever the case may be, the GF104 gives very promising overclocking and we should soon see numerous variants at higher clocks hitting the market.
The testFor this test, we decided to retain our previous protocol. We have therefore used ArmA 2, Need for Speed Shift, World in Conflict Soviet Assault, Anno 1404, Red Faction Guerilla, Crysis Warhead, Far Cry 2, HAWX, Battleforge, S.T.A.L.K.E.R. Call of Pripyat, DiRT 2 and Metro 2033. We have also retained Batman Arkham Asylum to test performance levels when PhysX is used.
The tests were carried out at 1920x1200 and 1680x1050, without FSAA, with 4x MSAA and with 8x MSAA. Note that we made sure to test this mode on the GeForces, which isn’t always easy to work out. In the NVIDIA drivers, 8x antialiasing is in fact MSAA 4x with CSAA 8x which doesn’t give the same quality as MSAA 8x, which is called, for its part 8xQ antialiasing. This is therefore the filter we tested. We opted for high but not extreme settings in the most demanding games.
We added tests at 1680x1050 with and without FSAA in advanced DirectX 11 modes that include all the graphics options linked to this API. We carried out these tests separately as they can’t be run on DirectX 10 cards.
We have also decided to stop showing decimals in game performance results so as to make the graph more readable. We have nevertheless noted these values and used them when calculating the index. If you look closely, you’ll notice that the size of the bars also reflects this.
We also added performance tests in games and the results obtained for a GeForce 8800 GT 512 MB (which is equivalent to a GeForce 9800 GT or a GeForce GTS 250) so as to represent the gains brought by the new graphics solutions in comparison to this very popular card.
Test configurationIntel Core i7 975 (HT and Turbo deactivated)
6 GB DDR3 1333 Corsair
Windows 7 64 bits
Need for Speed Shift
Need for Speed Shift
To test the most recent in the Need for Speed series, we pushed all options to a max and carried out a well-defined movement. Patch 1.1 was installed.
Note that AMD has implemented an optimisation that replaces certain 64-bit HDR rendering areas with others at 32-bit. NVIDIA naturally has jumped on this but it doesn’t spoil quality. In reality AMD is taking advantage of an architecture particularity that can process 32 bit HDR formats at full speed, which NVIDIA can’t do and therefore has no reason to put into place.
In this first test, the GeForce GTX 460 1 GB is slightly ahead of the GeForce GTX 465, which is almost equaled by the 768 MB version.
To test Arma2, we carry out a well-defined movement on a saved game. We used the very high graphics setting in the game and pushed all the advanced options to high as well.
ArmA 2 allows you to set the display interface differently to 3D rendering which is then aligned with the display via a filter. We used identical rendering for both.
The antialiasing settings offered for this game aren’t clear and are different between the AMD and NVIDIA cards. 3 modes are offered by AMD: low, normal and high. These correspond to MSAA 2x, 4x and 8x. With NVIDIA things get more complicated:
- low and normal = MSAA 2x
- high and very high = MSAA 4x
- 5 = MSAA 4x + CSAA 8x (called 8x in the NVIDIA drivers)
- 6 = MSAA 8x (called 8xQ in the NVIDIA drivers)
- 7 = MSAA 4x + CSAA 16x (called 16x in the NVIDIA drivers)
- 8 = MSAA 8x + CSAA 16x (called 16xQ in the NVIDIA drivers)
We therefore used high (4x) and 6 (8x).
Patch 1.5 was installed.
Here, the GeForce GTX 460 1 GB has a bigger advantage over the GTX 465. The Radeon HD 5850 remains a notch above.
World in Conflict Soviet Assault
Very visually impressive and very demanding, World in Conflict supports DirectX 10. Its Soviet Assault add-on has some very small extra graphics options, though the internal test that we are using, is with the same scene. We use max quality mode which includes DirectX 10, push anisotropic filtering to 16x and activate all other graphics options. The game doesn’t support 8x MSAA.
With 4x antialiasing, the GeForce GTX 460 1 GB is on a par with the Radeon HD 5850. The 768 MB version remains a little down on the 1 GB card.
To test Anno 1404, we carry out a well-defined movement on a relatively heavy card. All options were pushed to a max. We installed patch 1.2.
Once again the GeForce GTX 460 1 GB does better than the GeForce GTX 465 with the advantage increasing as the level of antialiasing is increased. The 768 MB version trails again here.
Red Faction Guerilla
Red Faction Guerilla
Very demanding, Red Faction Guerilla was tested with all options pushed to a maximum. We measured performance in the introduction scene.
Fillrate seems to have a small influence on results here, with the GeForce GTX 465 up on the GeForce GTX 460s for once.
Crysis Warhead replaces Crysis and has the same resource heavy graphics engine. We tested it in version 1.1 and 64-bit mode as this is the main innovation. Crytek has renamed the different graphics quality modes, probably so as not to dismay gamers who may be disappointed at not being able to activate the highest quality mode because of excessive demands on system resources. The high quality mode is renamed “Gamer” and the very high quality mode “Enthusiast”. We tested “Enthusiast”.
With 8x antialiasing, the GeForce GTX 460 768 MB suffers from a lack of memory. However the 1 GB version has a 10% lead on the GeForce GTX 465.
Far Cry 2
Far Cry 2
This version of Far Cry isn’t really a great development as Crytek made the first episode in any case. As the owner of the licence, Ubisoft handled its development, with Crytek working on Crysis. No easy thing to inherit the graphics revolution that accompanied Far Cry, but the Ubisoft teams have done pretty well, even if the graphics don’t go as far as those in Crysis. The game is also less resource heavy which is no bad thing. It has DirectX 10.1 support and Radeon performances benefit. We installed patch 1.02 and used the “ultra high” quality graphics mode.
Once again, the GeForce GTX 460 768 MB is on a par with the GeForce GTX 465, while the 1 GB card takes the lead.
The most recent Tom Clancy, H.A.W.X. is a flying action game. It uses a graphics motor that supports DirectX 10.1 to optimise results. Among the graphics effects it supports, note the presence of ambient occlusion that’s pushed to a max along with all other options. We use the built-in benchmark and patch 1.2 was installed.
The GeForce GTX 460 1 GB isn’t far behind the Radeon HD 5850.
The first game with DirectX 11, or more precisely Direct3D 11 support, we couldn’t not test BattleForge. An update added at the end of September gave support for Microsoft’s new API.
Compute Shaders 5.0 are used by the developers to accellerate SSAO processing (ambient occlusion). Compared to standard implementation, via the Pixel Shaders, this technique allows more efficient use of the available processing power by saturating the texturing units less. BattleForge offers two SSAO levels: High and Very High. Only the second, called HDAO (High Definition AO), uses Compute Shaders 5.0.
We used the game’s bench and installed the latest available update.
Battleforge is probably the game in which the GeForce GTX 460 is the least at ease, without antialiasing. We suppose that its performance, as well as that for the other GeForce GTX 400s, but to a lesser extent, is affected here by the limitation in terms of the fillrate that we described earlier in this report. This is also why antialiasing hardly affects the GeForces here, as they are greatly limited at another level.
S.T.A.L.K.E.R. Call of Pripyat
S.T.A.L.K.E.R. Call of Pripyat
This new S.T.A.L.K.E.R. suite is based on a new development of the graphics engine which moves up to version 01.06.02 and supports Direct3D 11 which is used both to improve performance and quality, with the option to have more detailed light and shade as well as tessellation support for the first time!
We put it in high quality mode. We didn’t activate the higher quality options available in the Direct3D 11 version of the engine so as to be able to compare results. The game doesn’t support 8x antialiasing. Our test scene is 50% outside and 50% inside and inside it is surrounded with several characters.
La Radeon HD 5850 retains a significant advantage here and the Radeon HD 5830 equals the GeForce GTX 460 and 465 with antialiasing
Next we carried out tests with tessellation activated, which has a negligeable impact on the outside part of the scene but a significant impact on the inside part:
The GeForce GTX 460 1 GB here equals the GeForce GTX 465. The first is slightly faster in the outside scene and the the second is slightly faster in the inside scene which calls on tessellation.
Codemaster’s latest, DiRT 2, inaugurates the new version of the studio's engine which now supports DirectX 11. The API is used both to improve HDAO efficiency and improve the quality of the water, the crowd and some of the flags, using tessellation. The effect remains nevertheless quite subtle. First of all, we tested the game in DirectX 9 mode, pushing all settings to a max, as DirectX 11 isn’t comparable and DirectX 10 cards are still tested.
Patch 1.1 was installed.
In DirectX 9 mode, the GeForce GTX 460 1 GB has the advantage over the GeForce GTX 465 and is on a par with the Radeon HD 5850.
Next we carried out some tests in DirectX 11, again pushing all settings to a max, including tessellation:
With all the DirectX 11 effects activated, the Radeon takes the lead once again with antialiasing.
The most recent demanding game out, Metro 2033 forces all recent graphics cards to their knees. It supports GPU PhysX but only for the generation of particles during impacts, a quite discrete effect that we therefore didn’t activate during the tests. In DirectX 11 mode, performance is identical to DirectX 10 mode but with 2 additional options: tessellation for characters and a very advanced, very demanding depth of field feature.
First we tested in DirectX mode at high quality:
Here the GeForce GTX 460 1 GB is equal to the Radeon HD 5850. The 768 MB version suffers from having limited memory when antialiasing is activated.
Next we carried out tests in DirectX 11 pushing quality to very high and activating tessellation and Depth of Field. At very high quality, the game applies a filter called analytical antialiasing, which tries to reduce what is required for the scene but with sometimes inconclusive results. There’s also 4x MSAA support.
With DirectX 11 options activated, the GeForce GTX 465 has a very slight advantage over the GeForce GTX 460 1 GB which however is also still on a par with the Radeon HD 5850 when antialiasing is activated. In any case and although we reduced resolution to 1680x1050, none of these solutions gave enough performance to play at this quality level.
Batman Arkham Asylum
Batman Arkham Asylum
A big hit at the moment, we can’t ignore Batman Arkham Asylum in spite of the partisan nature of the technology it uses. Of course, we’re talking about GPU PhysX, the proprietary accelleration technology only supported by the GeForces.
We installed patch 1.1 and used the built-in benchmark with all options pushed to a max, including PhysX effects.
NVIDIA has taken advantage of helping developers put antialiasing in place in the game and has restricted functionality to GeForces. As we first thought, the implementation is standard and therefore also compatible with Radeons but it has been protected by NVIDIA. A detestable practice to force Radeon users, and testers, to activate antialiasing via the control panel, which is much less efficient than doing it directly in the game as the pilot applies antialiasing blindly and unnecessarily on all surfaces. Thankfully, all you have to do is dress your Radeon up as a GeForce with ATI Tray Tools for the control panel to accept activating antialiasing.
As the Radeons can’t accelerate these effects, they are limited by the CPU, on which, moreover, not all the cores are used.
The GeForce GTX 460 768 MB suffers from its lack of memory as the use of GPU PhysX increases video memory demands.
Performance recapAlthough individual game results are worth looking at, we have calculated a performance index based on all tests with the same weight for each game. Batman Arkham Asylum with GPU PhysX is not included in the index.
We attributed an index of 100 to the Radeon HD 5850 at 1920x1200 with 4x MSAA.
With a gain of 1 to 5% on the GeForce GTX 465, the GeForce GTX 460 1 GB has the advantage! In comparison to the AMD offer, this GeForce GTX 460 1 GB has between an 8 and 30% lead over the Radeon HD 5830, a lead which increases as the level of antialiasing is upped. Performance levels are nevertheless down on the Radeon HD 5850, but only slightly with a difference of between 10 and 3% here, which is reduced as the level of antialiasing is upped.
The GeForce GTX 460 768 MB does of course trail a little, especially at 8x antialiasing, a mode in which its memory capacity is several times too small. In any case, it retains its advantage over the Radeon HD 5830.
In terms of the overclocked partner cards, the Gainward shows a 2 to 3% gain over the stock GeForce GTX 460 1 GB for a clock increase of 4%. The MSI card is 6% up on the stock GeForce GTX 460 768 MB for an 8% increase in the GPU clock.
We have also drawn up an index of results obtained at 1920x1200 in the DirectX 11 modes of the four games tested with all options enabled. For Battleforge, these are the same results as those used to calculate the index above, but for the other games, tessellation, among other things, is included.
ConclusionRumoured to be a very promising card since the arrival of the GF100, the GF104, now on the market with the GeForce GTX 460, has fulfilled its promise. NVIDIA has succeeded, and this is something that hasn’t happened for quite some time, in producing a graphics card with a similar price/performance ratio to what AMD are offering. Moreover, NVIDIA has succeeded in doing so in the €200 to €250 segment so significant to gamers and left empty by AMD. With its GeForce GTX 460s, NVIDIA has thus scored a great hit.
It is probable that the arrival of this GF104 was influenced by AMD’s success with the Radeon HD 3800s and 4800s that caught NVIDIA short. By designing such effective products, AMD was able to engage in a price war that NVIDIA found very hard to compete with as it had invested a significant section of its architecture in GPU computing. The result was a loss of ground in what is probably the biggest market segment. NVIDIA’s reaction has been unequivocal. While the first GPU to follow the G80 represented 25% of its processing power, the first to follow the GT200, after a very very long wait, represented 40%, here NVIDIA has given us the GF104 just 3 months after releasing the GF100 and at 75% of its processing power.
In order to achieve this, NVIDIA has revisited its architecture and added muscle so that the basic blocks, at the price of a slightly reduced yield, have gained numerous execution units. Here texturing units are best served as they still give a very good return on investment when the target market is gamers, as is the case here. An other way of seeing things is as follows: the GF104 has bartered away some of the efficiency of the GF100’s architecture in terms of GPU computing against better yield in games.
NVIDIA has also managed to contain energy consumption on its new baby, achieving particularly good numbers in idle and with good control in load, which shows that the card hasn’t been pushed to its limits and leaves plenty of opportunity for overclocking, as we have seen. The flagship DirectX 11 tessellation function still gives very good performance. Sure, there are half the number of units on the GeForce GTX 460 compared to the GeForce GTX 480 but the 460 still managed to attain twice the performance levels of the Radeon HD 5870. With the GeForce GTX 460, GPU PhysX and the increasingly widespread support for CUDA in applications dedicated to video are becoming real bonuses, a much more apt position for them to be in than the consolation prize that has been their lot given the poor price/performance ratios of some other GeForces.
It’s not a perfect world however. The GF104, like the GF100 moreover but to a greater extent, suffers from a significant limitation in terms pixel throughput which holds back performance somewhat. This isn’t a huge problem in practice, given that it means that the card peforms well with antialiasing enabled, but we do feel that the GF104 leaves something to be desired in some games, which is a shame.
The biggest reproach we have is that, as so often, NVIDIA has laid on the marketing a tad too thick. They have had the very bad idea of giving the same name to two GF104 derivatives. This means that there will be different GeForce GTX 460s. No doubt advanced users will be able to tell them apart due to their quantity of memory, but many consumers won’t know about this nuance.
The first GeForce GTX 460 has 1 GB of memory and, while waiting for probably useles 2 GB variants to appear, it represents a worthwhile choice for gamers. Announced at €230, it offers a similar price/performance ratio to the Radeon HD 5850 or the newly repositioned Radeon HD 5830 and will be well adapted to new screens with 1680x1050, 1920x1080 and 1920x1200 resolutions. Of these 3 solutions, if we had to go for one, it’s the new arrival that we prefer. It goes without saying that it makes the nevertheless very recent, more expensive and lower performance GeForce GTX 465 completely uninteresting. It’s clear that having a stock of 465s now won't make manufacturers and resellers happy so you should expect significant price cuts on this part.
The second GeForce GTX 460 only has 768 MB of memory, 25% fewer ROPs and a bandwidth reduced by the same proportion. It will therefore be less efficient with antialiasing. Its reduced video memory limits it in terms of resolution, especially if you're expecting to keep it for some time. At €200, it will generally be perfect to enjoy graphics refinements at 1280x1024 or 1680x1050 and under these conditions we prefer it to the Radeon HD 5830, if your budget allows. Note that a 1.5 GB version will be making its appearance but will give lower performance than the 1 GB version.
As with any supposedly popular GPU, many models of cards are in preparation from NVIDIA's partners. We’ve already had a glimpse here, first with the MSI GeForce GTX 460 Cyclone 768D5 OC Edition, which offers slightly better performance than the stock card and better cooling in load. Next the Gainward GeForce GTX 460 1 GB is more compact and integrates all the video outs directly on the card but is noisier than the stock model.
Lastly, lets take a look at the pricing. Officially they should come in at €200 for the 768 MB card and €229 for the 1 GB version. Some partners have however mentioned a premium of €20 on both variants, which makes them less of a bargain. As the official pricing is viable according to the information we have, it ought to stand as long as NVIDIA is able to maintain GPU production at sufficient volumes. If this doesn’t happen, prices will obviously rise.
Copyright © 1997-2015 BeHardware. All rights reserved.