The GK106The GK106 uses the same variant of the Kepler architecture as the GK104, differing simply in its configuration.
It thus retains the same organisation of processing units in SMXs, each of which has 192 processing units, 16 texturing units and a 64 KB L1 cache. SMXs are a development of the previous generation SMs, optimised for a higher yield, especially in terms of energy consumption. You can find all the details on this in the review of the GeForce GTX 680. Each SMX can throughput up to 192 FMA instructions per cycle (384 flops), four pixels per cycle and a triangle every two cycles.
The memory interface is also of the same type with blocks containing a 64-bit memory controller, optimised for high-frequency GDDR5, a 128 KB L2 cache and 8 ROPs charged with writing pixels to memory after they have been rendered.
While the GK104 has eight SMXs and four memory controllers, the GK106 has just five of the first and three of the second. It thus has a total of 960 processing units, 80 texturing units, a 384 KB L2 cache, a 192-bit memory bus and 24 ROPs. While the GK104 throughputs 32 pixels and four triangles per cycle, the GK106 manages up to 20 pixels and 2.5 triangles.
Note that the throughput of pixels is limited by the number of SMXs (5x4) although the ROPs would be able to write 24 per cycle. These additional ROPs can however be useful for multisampling type antialiasing processing, which can add a significant load.
Like the GK104, the GK106 is manufactured on TSMC's 28nm process. The fact that there are fewer execution blocks means only 2.5 billion transistors are required as opposed to 3.5 billion on the GK104, which puts the GK106 on a similar level of complexity as the Pitcairn CPU, used in the Radeon HD 7800s, which has 2.8 billion transistors. There has also been a corresponding reduction in surface area (214 mm² for the GK106, 212 mm² for Pitcairn and 294 mm² for the GK104), which brings down manufacturing costs.
Note that AMD seems to have a slightly higher transistor density, probably because its architecture is based on more SRAM for its different caches and registers.