The first day of the Fusion Summit finished with a surprising session during which Michael Mantor, Senior Fellow Architect, and Mike Houston, Fellow Architect, unveiled AMD’s future GPU architecture with numerous details, only leaving a few specific 3D rendering elements in the dark.
Michael Mantor, AMD Senior Fellow Architect.
AMD has been working on this new architecture for almost five years already with the principal aim of simplifying the programming model so as to convince a maximum of developers to look at the processing power offered by GPUs. This is also the first architecture to have been significantly influenced by the merger with ATI and the Fusion project.
This architecture thus marks a significant break with current GPUs by getting rid of the VLIW model, which is based on the simultaneous execution of multiple independent instructions, to make way for scalar functioning from the programmer’s point of view. The front-end, the command processors and the cache structure have been entirely redesigned to give a higher performance and more flexible compute mode and to process multitasking more efficiently, something that’s going to become more and more important for GPUs.
Asynchronous Compute Engines (ACE) have come onstream to handle compute tasks without having recourse to the graphics command processor, which however is not left legging as the units that manage the geometry and pixels are parallelised, benefitting tessellation. In contrast to NVIDIA’s approach, the geometry isn’t distributed to blocks of compute units, but is uncoupled from them.
The SIMDs in the current architecture have been replaced by Compute Units Each CU now has a scalar unit and 4 small independent SIMDs that are similar to those in NVIDIA’s GPUs. Basically, a Cayman SIMD can execute a vec4 instruction on 16 elements in each cycle while each CU can execute 4 instructions on 16 elements from four different groups plus a scalar instruction. The compute power of a CU is therefore similar to that of a current SIMD but should be a good deal more efficient.
AMD would not say when this architecture would be introduced and simply said that the Trinity GPU would not be based on it but on the vec4 architecture used in the Radeon HD 6900s. There is however unofficial talk of implementation sometime this year and maybe of a demo of the GPU inaugurating it coming during the closing keynote of the Fusion Summit!
We’ll therefore likely only have to wait a few months to see what the new architecture, that looks so promising on paper, will bring in practice. While it will eventually facilitate optimisation of the GPU compiler, it will however require a good deal of effort from the teams in charge of drivers given what a change it represents in comparison to the current GPUs. With respect to the cost of this new architecture, AMD told us that it’s only slightly higher than that of current architectures, with some parts being more complex but others simpler. It shouldn’t therefore put a break on increasing the number of processing units.
Note that this architecture will have more modularity than previously as, as well as the number of CUs, AMD will be able to vary the number of ACEs, the number of pipelines given over to geometry or pixels, double precision processing power (from 1/2 to 1/16)… It looks as if the first implementation is likely to be on a high end GPU with at least 30 CUs, several ACEs and double precision computing at half speed.
Here then are the main lines of this forthcoming architecture, which we’ll try and come back to in more detail after the Fusion Summit:
Also here are two examples of code generated for the current architecture on and the new architecture: