PowerTune vs GPU BoostAMD and Nvidia have both introduced energy consumption monitoring systems for their graphics cards, which allows them to fix more aggressive clocks at the same time as still being able to guarantee the reliability of their products. Without such systems, they would have to settle for lower clocks or risk reliability issues as heavy rendering tasks could then take their graphics cards and GPUs beyond the energy consumption levels they had been designed for.
Once implemented, this energy consumption monitoring can also be used for features, such as for example introducing turbo or energy economy modes.
AMD was first to develop such a technology, with PowerTune, introduced on the Radeon HD 6900s and included on all the Radeon HD 7000s. Like the turbo technologies used for CPUs, PowerTune estimates the energy consumption of the chip using lots of internal sensors for the different blocks that make up the GPU. It may seem strange at first to estimate energy consumption instead of taking a reading, but this means that AMD can determine behaviour in terms of performance. A relatively complex formula then transposes these usage levels into power consumed taking account of the parameters that correspond to the least favourable case: a GPU with significant current leakage which is in a very hot environment.
AMD has recently made two refinements to the technology. The first consists in estimating the GPU temperature by giving a temporal value to the estimated consumption. This estimated temperature can therefore replace the constant which represents the worst of cases and gives more flexibility in standard situations. To simplify things, the idea is to estimate the energy consumption more precisely so as to avoid being too conservative. The real temperature is still taken and used both as a higher level of protection and to regulate the fan.
This development to PowerTune came on stream as of the Catalyst 12.7 betas and will be rolled out across the Radeon HD 7900 range. In practice this won't make any difference in games, outside of major overclocking, but it will allow these cards to retain a higher clock in stress tests. Note that in the future AMD could exploit this new capability to authorise the GPU to exceed its TDP for a few seconds, in the same way that Intel does with its latest CPUs, but for 3D rendering in real time the load is relatively constant over time and the feature will therefore be of little interest.
The second innovation is the introduction of a turbo feature called Boost, a mode it had become difficult to avoid. In concrete terms Boost represents the capacity of PowerTune to modify the GPU voltage, in addition to its clock. This innovation is reserved for the Radeon HD 7970 GHz Edition and the HD 7950 v2 as the bios must contain a table of clock/voltage pairs, but more importantly because there’s a more complex GPU validation process. The Radeon HD 7970 GHz has thus been validated up to 1000 MHz with a fixed standard voltage but also up to 1050 MHz with a voltage that climbs progressively (850 MHz and 925 MHz in the case of the Radeon HD 7950 v2). PowerTune currently supports up to 256 steps (P-states) with a granularity of 4 MHz.
In practice, as the TDP of the Radeon HD 7970 GHz Edition remains oversized in comparison to energy consumption in video games (with some rare exceptions), Boost can be seen as a way of safely validating the GPU at a higher clock, which will be applied constantly across almost all the games, unlike a turbo that is likely to bring very variable gains.
The situation is different however for the Radeon HD 7950 v2, not only because its thermal envelope doesn’t have as much of a margin but above all because AMD has planned to use up samples of the Tahiti GPU that have very high current leakage on this model. In other words, AMD needs to remain very conservative with respect to the parameters it uses for estimating energy consumption. To compensate for this a bit, the TDP has been increased from 200 to 225W, but this is still insufficient to allow Boost to kick in consistently as it does with the Radeon HD 7970 GHz Edition. The Radeon HD 7950 v2 thus settles for a GPU clock of 850 MHz most of the time, this being the maximum clock allowed without any increase to the GPU voltage.
Note that as Boost increases the voltage, energy consumption increases exponentially, which doesn’t make this a good solution in terms of improving energy yield. This is also the case with Nvidia’s GPU Boost, which doesn’t aim to improve yield per Watt but to make the most of each Watt available to offer slightly higher performance.
However this is the only thing the two technologies have in common: PowerTune is entirely deterministic: under identical conditions, all the Radeon HD 7900s behave in the same way. This isn’t the case for the GeForces.
Thanks notably to the fact that energy consumption is well under control, Nvidia has managed to introduce a turbo feature for its GPUs called GPU Boost. GPU Boost has also been designed to maximise the GPU clock to fully benefit from the available thermal envelope. We aren’t fully convinced by Nvidia’s approach as GPU Boost is non-deterministic
: in contrast to CPUs, it is based on real energy consumption which varies between each GPU sample according to manufacturing quality and the current leakage affecting it.
Why go for such an approach? Nvidia was probably caught napping when AMD, benefiting from the experience of its CPU team, introduced PowerTune with the Radeon HD 6900s and hasn’t yet been able to introduce a similar technology. It has to be introduced at the heart of the architecture and we imagine that Kepler was already too far along in its development for this to be done. Nvidia therefore responded with external monitoring with the Geforce GTX 500s. It’s this same system which is still used on the GTX 600s and we’ll have to wait for the next generation for a more evolved technology to be implemented.
Moreover the current Nvidia technology has the disadvantage of being relatively slow (100ms vs a few ms) but however has the advantage of allowing each sample to fully benefit from the whole TDP while for CPUs and the Radeons energy consumption is overestimated and not all samples are therefore able to enjoy all their available TDP.
What’s more, Nvidia doesn’t validate all same derivative samples of this GPU (the GK104-400 for the GTX 680, the GK104-325 for the GTX 670 and the GK104-300 for the GTX 660 Ti) at the same maximum turbo clock. Officially, Nvidia settles for giving a guaranteed maximum clock but allows the GPU to exceed this if it qualifies to do so. In other words, GPU Boost also represents an automatic GPU overclocking. The problem is that the press rarely receive medium level samples and as a result the performance levels we give are somewhat higher than what you may find with samples from stores.
Nvidia justifies itself by explaining that it aims to give maximum performance to each sample and says that while variation in the maximum GPU Boost clock can be significant, the variation in average GPU Boost clock observed is lower, the reason being that the energy consumption limit stops the GPU from increasing to a very high level in the more demanding games and also that the temperature limits it.
What Nvidia fails to say is that we also slightly overevaluate performance levels as our testing is carried out under ideal conditions: brief test on a workbench. Although it might make our work more fun, unfortunately we can’t play for an hour to heat up the GPU before taking each performance reading! To recap, we observed the difference in performance between two GeForce GTX 680s. Without looking for worst case performance we observed a theoretical difference of 2% and 1.5% in practice. This isn’t enormous but is a nuisance where the difference with the competition is so tight. With the margin for manoeuvre given to GPU Boost exploding with the GeForce GTX 670 and GTX 660 Ti (15%), it really does become problematic as far as we can see.
What's the solution? Ideally Nvidia would allow testers to limit cards at the level of the least favourable case with the GPU Boost clock limited to the officially guaranteed value. As Nvidia doesn’t offer such a solution to limit GPU Boost, we therefore chose to simulate such a sample ourselves by juggling with overclocking parameters. This enables us to give you a very precise measure (in spite of the ‘DIY’ aspect of the solution) of the guaranteed performance for a basic sample as well as the level of performance you can get with a more favourable sample. What’s the situation in stores with respect to the sort of sample you can expect? Unfortunately we don’t know as Nvidia and its partners have categorically refused to go into the matter.