Home  |  News  |  Reviews  | About Search :  HardWare.fr 

MiscellaneousStorageGraphics CardsMotherboardsProcessors
Advertise on BeHardware.com

Graphics cards

<< 5 previous news
5 news of this page
5 next news >>

- AMD launches the Radeon HD 7000Ms and Enduro
- Review: NVIDIA GeForce GTX 680
- Review: AMD Radeon HD 7750 and HD 7770
- New GeForce 600Ms: Fermi and Kepler
- Review: AMD Radeon HD 7850 & 7870
- Nvidia GeForce GTX 690: review of a €1000 card!
- GTC: More details on the GK110
- GTC: Nvidia lifts the veil on the GK110
- Radeon HD 7970s and 7950s roundup
- Nvidia launches the GeForce GTX 690
- Never Settle: Catalyst 12.11 and new game bundle
- AMD launches the FirePro Wx000 and A300
- Lucid Virtu MVP: special benchmark mode & XLR8
- Review: AMD Radeon HD 7970 GHz Edition
- Review: Nvidia GeForce GTX 670

 Nvidia GeForce GTX 690: review of a €1000 card!
  Posted on 03/07/2012 at 00:01 by Damien
Imprimer cette news Envoyer cette news par e-mail

While we were expecting AMD to come through first with a new bi-GPU graphics card, in fact Nvidia have drawn fastest with the ultra exclusive GTX 690 on sale for no less than €1000!

> Nvidia GeForce GTX 690: review of a €1000 card!

 GTC: More details on the GK110
  Posted on 17/05/2012 at 04:40 by Damien
Imprimer cette news Envoyer cette news par e-mail

At a technical session on the GK110 architecture we were able to learn some details to add to what we brought you yesterday. This new information is of course focussed on the compute part of the GPU. First of all, Nvidia presented an architecture schema that clearly shows that the GK110 is made up of 15 SMXes, each with 192 processing units (Cuda cores), or a total of 2880, and a 384-bit memory bus.

Moreover, we learn that the L2 cache is now at 256 KB per 64-bit memory controller, making up a total of 1.5 MB, against 768 KB for the GF1x0 and 512 KB for the GK104. As with the GK104, each portion of L2 cache has twice the Fermi generation bandwidth.

The fundamental processing unit blocks, called SMXs on the Kepler generation, are similar on the GK110 to those on the GK104:

The number of single precision processing units is the same, as is the number of special function, read/write and texturing units. The caches are also identical whether this be the registers, the L1/shared memory or the texturing caches.

The only fundamental difference lies in the increase in double precision processing units, which are up from eight on the GK104 to 64 on the GK110. So while the GK104 is 24x slower in this mode than in single precision, the GK110 will only be 3x slower. Coupled with the increase in the number of SMXes this gives us a GK110 that can process 15x more of these instructions per cycle! Compared to the GF1x0, this represents a direct gain of 87.5% at equal clocks.

In the GK110, like the GK104, each SMX is fed with four schedulers, each of which is capable of sending two instructions. However not all the execution units can be accessed by all the schedulers as an SMX is in practice separated into two symmetrical parts inside of which two schedulers share the various units. Each scheduler has its own lot of registers: 16384 32-bit registers (actually 512 general registers of 32x32 bits). Moreover each scheduler has a dedicated block of four texturing units accompanied by a 12 KB cache.

In contrast to what we were expecting, the L1 cache / shared memory system is the same on the GK110 as the GK104 and remains proportionally smaller to what the Fermi generation provided. Nvidia has however introduced three small developments that can give important gains:

Firstly, each thread can have up to 256 registers allocated to it, as against 64 previously. What’s the point of this if there’s no increase in the number of physical registers? It’s a way of giving more flexibility to the developer and the compiler to juggle between the number of threads and the number of registers allocated to each to maximise performance. This is particularly important for double precision processing that takes up twice the number of registers and which was previously limited in having just 32 registers per thread. Nvidia says that increasing this to 128 gives impressive gains in certain cases.

The second little development consists in authorising direct access to caches dedicated to texturing. It was previously possible to create a way of accessing them manually through the texturing units, but this method wasn’t practical. With the GK110, these 12 KB caches can be exploited directly by the SMXes but only in the case of read only data accesses. They have the advantage of providing excellent access to the GPU’s memory subsystem, suffering less in the case of cache misses and better supporting non-aligned accesses. The compiler (via a directive) calls on them when useful.

Finally, a new instruction makes its appearance: SHFL. It enables the exchange of 32 bits of data per thread within a warp (block of 32 threads). Its function is similar to that of the shared memory and thus comes as a kind of compensation for the relatively small quantity of shared memory (in proportion t the number of processing units). When it comes to data exchange it will therefore be possible to gain time (direct transfer in place of a write then a read) and economise on shared memory.

There are also several other minor developments such as the addition of a few missing 64-bit atomic operations (min/max and logic operations) and a 66% reduction in ECC overhead.

We can conclude, then, by saying that with the Kepler generation, Nvidia has indeed taken a different route than it did with Fermi. The big Fermi GPU, the GF100/110, had a different internal organisation to that of the other GPUs in the family, increasing the control logic to the detriment of the density of processing units and energy yield.

With the GK110, Nvidia didn’t want to make the same energy compromise or rather say that it couldn't. They are now trying to do as much as they can within a thermal envelope that can no longer be extended. This is why the internal organisation of the GK110 is the same as the GK104, with the exception of double precision capacity that has been increased significantly.

Thus, Nvidia hasn’t tried to make its architecture any more complex to support GPU computing performance and has simply tried to do as much as it can with the available resources by settling for minor developments which however can have a major impact. This is also why the command processor has been revised to allow maximum use of the GPU with the Hyper-Q and Dynamic Parallelism technologies that we described briefly yesterday and that we will return to with more details as soon as possible.

 GTC: Nvidia lifts the veil on the GK110
  Posted on 16/05/2012 at 00:40 by Damien
Imprimer cette news Envoyer cette news par e-mail

Without naming it directly, Jen-Hsun Huang, Nvidia CEO, has just unveiled the first information on the ‘big’ Kepler GPU, the GK110. First of all it was confirmed that this GPU is indeed enormous, with no fewer than 7.1 billion transistors engraved at 28nm, a new record. There were still some doubts on this figure as it corresponds almost exactly to two GK104 GPUs, the configuration of the new Tesla K10, but this is simply a coincidence.

The GTC being the GTC, Nvidia is of course first and foremost highlighting its impact for the pro world, particularly high performance processing. Two new important technologies make their appearance. Firstly Hyper-Q which enables the use of up to 32 work queues to feed the GPU, in contrast to just one previously. This limited full exploitation of GPU processing capacity in many cases.

Hyper-Q maximises GPU rendering and reduces processing times.

The second innovation is called Dynamic Parallelism and also brings a solution to a current efficiency issue. Work on the GPU is generally segmented and each segment is initiated by the CPU. Between each segment, the GPU thus hands over to the CPU, which receives the results of a particular segment/function and in certain cases simply sends these same results back to the GPU to launch another dependent function. Obviously this is inefficient and Dynamic Parallelism represents the capacity of the GK110 to generate new tasks for itself, thus avoiding the toing and froing with the CPU.

This new flexibility in the execution of tasks will facilitate work for developers, firstly by giving improved rendering more simply and secondly by allowing them to write programmes in a more natural way.

Finally, Nvidia unveiled a photo of the GK110 die, with, it’s true, some artistic license, but which does give us some idea of the spec of this GPU. Firstly, we can see that there are 15 SMXes and a 6x 64 bit (384 bit) memory controller. Nvidia confirmed these specs and told us that the GK110’s SMXes will also, like the GK104, be equipped with 192 processing units (Cuda cores). In total there will therefore be 2880 and, once again, this is a new record and even though Nvidia is saying that, in principle, there won't be a Tesla derivative equipped with a full version of this GPU, with a GPU of this size (over 500mm²), a certain amount of over-capacity was necessary. The first Tesla K20 card based on the GK110 will probably be limited to 13 SMXes and 2496 Cuda cores.

Nvidia told us that the organisation of the registers and shared memory had been revised and that double precision processing power was very high, but we’ll have to wait another day or two before we get more detail on the GK110 memory subsystem.

From a graphics point of view, we imagine each SMX still has 16 texturing units and that the total number for the GPU is 240. The organisation of the SMXs into five groups of three makes us think that the GK110 is likely to be able to process up to 7.5 triangles per cycle but only render five or six (depending on the implementation Nvidia have gone for), as against four and four for the GK104. It will apparently have 48 ROPs.

This forthcoming GPU will first of all be introduced at the end of the year as the Tesla K20 but won’t appear as a GeForce before the beginning of 2013. It will of course be interesting to see what the energy consumption levels are on such a monster, even though Nvidia is trying to make reassuring noises, saying that the Tesla K20 has been designed with a standard TDP of 'just' 225W!

 Radeon HD 7970s and 7950s roundup
  Posted on 09/05/2012 at 00:00 by Damien
Imprimer cette news Envoyer cette news par e-mail

How good are the customised Radeon HD 7970s and 7950s? To give an answer to this question, we have looked in detail at the Asus, HIS, MSI, PowerColor, Sapphire and XFX releases as well as the reference cards. When heat, noise and overclocking are all taken into account, which one is the best Radeon HD 7900?

> Roundup: the Radeon HD 7970s and 7950s from Asus, HIS, MSI, PowerColor, Sapphire and XFX

 Nvidia launches the GeForce GTX 690
  Posted on 29/04/2012 at 06:58 by Damien
Imprimer cette news Envoyer cette news par e-mail

Nvidia unveiled the GeForce GTX 690 earlier this week. It’s a bi-GPU card from the Kepler generation derived from the GeForce GTX 680. Given the contained energy consumption of the GK104 GPUs it runs on, its spec hasn't been revised downwards by as much as bi-GPU cards often are what with thermal envelope limitations. The GeForce GTX 690 GPUs will be clocked at 915 MHz, as against 1006 MHz for the GeForce GTX 680. There will of course be a turbo mode, with a GPU Boost clock announced at 1019 MHz, against 1058 MHz for the GeForce GTX 680. Each GPU has 2 GB of GDDR5 memory clocked at 1502 MHz.

Remember, the GPU Boost clock isn’t really a spec and is only a communications ploy. The real turbo clock is equal to or higher than this figure but variable, depending on the environment and quality of the GPU (current leakage) and the individual sample: Nvidia sells same reference GPUs validated at different clocks (between 1071 and 1110 MHz for the GTX 680) so that it can offer a certain number of “top performance” GPUs while maintaining a sufficiently high production volume. This is something that went unnoticed at the launch of the GTX 680 and we'll be coming back to it as soon as we can. This explains why Nvidia is uneasy when speaking about GPU Boost specs.

As with the GeForce GTX 680, Nvidia is categorically refusing to give any information on the real specs and the variation we can expect. This makes it difficult to ascertain exactly how the GeForce GTX 690 will perform in comparison to two GTX 680s in SLI, but we estimate that the SLI system will have around a 10% advantage.

Nvidia has abandoned its in-house switch so as to retain the GK104’s PCI Express 3.0 compatibility and moved to using a more recent PLX part. Although its TDP is around 300W and the GPU Boost is limited to 263W (maximum value for clock increases, though it isn’t reduced under 300W), the GeForce GTX 690 uses two 8-pin PCI Express connectors (375W), leaving a decent overclocking margin. The drivers allow you to increase the GPU Boost target by 35% to a max of 355W.

Nvidia has been talking up the design of the card, the materials of which have been reworked top to bottom to give it irreproachable manufacturing quality with aluminium and composites replacing the usual plastic. The cooling system is however very similar to the one used on the GeForce GTX 590 with two cooling blocks equipped with a vapour chamber and cooled by a central axial fan. Given the better contained energy consumption, noise levels should however be lower here. The connectivity is also the same: 3 DVI Dual-Link outs and a mini-DP out.

Unusually, Nvidia decided not to give information to the trade press in advance of launch and only supplied test cards once the announcement came into effect (earlier in the week). This approach has meant that Nvidia has been able to control what information was coming out on this card and prevent AMD from getting any advance knowledge to help it in preparation of its own bi-GPU card. A standard launch would have meant that AMD could check out the positioning of the card and potentially supply it’s own test card just before Nvidia’s announcement. On the other hand, proceeding like this has also prevented us from giving you the usual full review and encourages the various media outlets to be less circumspect and, once they have the card in hand, to hurry out the first benchmarks as fast as they can.

We therefore advise you not to take these early benchmarks too seriously and wait for full tests before you get into hot water with your bank and invest in such a model. Nvidia has announced the first available cards for May 3rd at something in the order of $999! We’ll try and publish a review of the GeForce GTX 690 as soon as possible.

<< 5 previous news
5 news of this page
5 next news >>

Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.