GK104: GF114 x2 It was important to introduce the developments put into place at block level of the GK104 processing units first so as to give an idea of the overall organization chosen by NVIDIA for the 1536 processing units! This number represents an enormous jump forward from the GF1x0’s 512 processing units and the GF1x4’s 384 units. We do have to take into account the fact that the shader clock (double the speed) has been shed, which nevertheles gives us double the processing and texturing units of the GF1x4!
The GK104 has 8 SMXs in 4 GPCs but just a 256-bit memory bus. Basically this GPU uses the same memory subsystem as the GF114 but with double the number of execution units and gives the same triangle and pixel throughput as the GF110. This is a relatively good balance but does suggest limitations in terms of GPU computing and a lack of memory bandwidth when it comes to performance with a high level of MSAA.
To make up for this, NVIDIA has doubled the bandwidth of its 512 KB L2 cache (now 512 bytes per cycle) and worked hard on its GDDR5 memory controller. While the memory clock couldn’t be increased much on the Fermi generation, with Kepler the GeForce GTX 680 has a memory clocked at 1.5 GHz (6 Gbps). Note that the GK104 has 32 ROPs, corresponding to its pixel rate, in contrast to the Fermi GPUs which were slowed down by the lower pixel throughput of their SMs. Kepler can also send at full speed to the ROPs pixels in FP10 or RGB9E5, pixel formats which enable the compression of HDR data in 32-bit. The ROPs keeps the same blending capabilities however.
The GK104, A2 version.
With the new 28nm fabrication process, the GPU clock is up significantly to 1 GHz (or rather 1006 MHz) and can go a good deal higher given the fact that a turbo boost has been introduced. With 3.5 billion transistors, the GK104 fits onto an area of just 294 mm², which is smaller than Tahiti (4.3 billion transistors and 352 mm²) and the GF114 (1.95 billion transistors and 367 mm²), which is manufactured at 40 nanometres.
NVIDIA hasn't included Direct3D 11.1 support but has included support for PCI Express 3.0 and has entirely revised its display engine. It now supports HDMI 1.4a 3 GHz for 1080p 3D at 60 Hz and 4k resolution and, more importantly, up to 4 video outputs at the same time! The advantage AMD had with Eyefinity has therefore been drastically reduced, especially as the GeForce GTX 680 can drive two DVI outs and an HDMI out directly without having to use a DisplayPort out with a native screen or an active adaptor.
Multi-screen the NVIDIA way: 3 + 1.
Moreover, the GK104 includes NVENC, a fixed H.264 encoder that uses less power to process encoding than the GPU processing units. This engine is similar to the Video Codec Engine that AMD supplies with the Radeon HD 7000s but NVIDIA has announced far superior performance: up to 240 fps at 1080p. It will be interesting to check this in practice along with the quality NVENC gives.