News of the day (May 16, 2012)
Without naming it directly, Jen-Hsun Huang, Nvidia CEO, has just unveiled the first information on the ‘big’ Kepler GPU, the GK110. First of all it was confirmed that this GPU is indeed enormous, with no fewer than 7.1 billion transistors engraved at 28nm, a new record. There were still some doubts on this figure as it corresponds almost exactly to two GK104 GPUs, the configuration of the new Tesla K10, but this is simply a coincidence.
The GTC being the GTC, Nvidia is of course first and foremost highlighting its impact for the pro world, particularly high performance processing. Two new important technologies make their appearance. Firstly Hyper-Q which enables the use of up to 32 work queues to feed the GPU, in contrast to just one previously. This limited full exploitation of GPU processing capacity in many cases.
Hyper-Q maximises GPU rendering and reduces processing times.
The second innovation is called Dynamic Parallelism and also brings a solution to a current efficiency issue. Work on the GPU is generally segmented and each segment is initiated by the CPU. Between each segment, the GPU thus hands over to the CPU, which receives the results of a particular segment/function and in certain cases simply sends these same results back to the GPU to launch another dependent function. Obviously this is inefficient and Dynamic Parallelism represents the capacity of the GK110 to generate new tasks for itself, thus avoiding the toing and froing with the CPU.
This new flexibility in the execution of tasks will facilitate work for developers, firstly by giving improved rendering more simply and secondly by allowing them to write programmes in a more natural way.
Finally, Nvidia unveiled a photo of the GK110 die, with, it’s true, some artistic license, but which does give us some idea of the spec of this GPU. Firstly, we can see that there are 15 SMXes and a 6x 64 bit (384 bit) memory controller. Nvidia confirmed these specs and told us that the GK110’s SMXes will also, like the GK104, be equipped with 192 processing units (Cuda cores). In total there will therefore be 2880 and, once again, this is a new record and even though Nvidia is saying that, in principle, there won't be a Tesla derivative equipped with a full version of this GPU, with a GPU of this size (over 500mm²), a certain amount of over-capacity was necessary. The first Tesla K20 card based on the GK110 will probably be limited to 13 SMXes and 2496 Cuda cores.
Nvidia told us that the organisation of the registers and shared memory had been revised and that double precision processing power was very high, but we’ll have to wait another day or two before we get more detail on the GK110 memory subsystem.
From a graphics point of view, we imagine each SMX still has 16 texturing units and that the total number for the GPU is 240. The organisation of the SMXs into five groups of three makes us think that the GK110 is likely to be able to process up to 7.5 triangles per cycle but only render five or six (depending on the implementation Nvidia have gone for), as against four and four for the GK104. It will apparently have 48 ROPs.
This forthcoming GPU will first of all be introduced at the end of the year as the Tesla K20 but won’t appear as a GeForce before the beginning of 2013. It will of course be interesting to see what the energy consumption levels are on such a monster, even though Nvidia is trying to make reassuring noises, saying that the Tesla K20 has been designed with a standard TDP of 'just' 225W!
Copyright © 1997-2013 BeHardware. All rights reserved..