What about AMD ?If you have followed the current progress of stream computing, you should know that AMD was the first to mention it with the announcement of a low level access (machine language machine) for its GPUs when launching the Radeon X1800 in October 2005. The details of this access, called DPVM for Data Parallel Virtual Machine, and renamed CTM for Close To Metal, were only given a year after in August 2006.
A few months afterwards during the finalization of the AMD’s buyout, ATI’s teams presented a few more practical applications, something which served well in fueling forum discussions concerning the merger. These presentations were completed by the release of an accelerated version of the Folding@home project via Direct3D by the X1900. However, the CTM wasn’t available yet and although ATI informed us then that a CTM version of Folding@home would soon arrive, we still haven’t seen anything yet.
In mid-November 2006, AMD launched the first product specific to this market with the Stream Processor which is a Radeon X1900 stripped of its video connections. Contrary to what we were told (CTM would concern all general public graphic cards) the CTM driver was only delivered to users of Stream Processor cards (if there are some) who were in direct contact with AMD developers because it could not be found on the manufacturer’s website. This was something that in the end dampened our enthusiasm and even bothered us, because besides the systematic hyping announcements coming with the release of general public cards (or to justify the buyout by AMD) we didn’t see anything too concrete.
Something new with R6xx GPUs? With the release of the Radeon HD 2000, AMD came back to this subject by presenting a series of evolutions. Low level CTM access was complemented by AMD Runtime, which in a way is the equivalent of CUDA’s runtime and is therefore a higher level access. The difference is that AMD Runtime could use multi-core CPUs as well as one or several GPUs. Next, AMD’s library of mathematical functions,ACML, optimized for its CPUs, integrated GPU equivalents. And finally, AMD offered extensions of C and C++ languages to pilot everything… as Nvidia has done with CUDA.
AMD seems to have followed Nvidia by moving to a higher level mode of use. However, AMD isn’t shy with its criticism towards CUDA which is described as a useless solution, or in other words, too complex for most developers and too far away from the exact specificities of GPUs to be able to develop effective libraries. This criticism regarding CUDA isn’t totally without foundation and we could suppose that this incited Nvidia to document PTX.
With CUDA Nvidia made the choice of quickly offering something usable and would later provide supplementary optimizations. This is while AMD first went with a very complex low level language before proposing more, or at least before offering marketing documents that said that the manufacturer would offer more than what we still haven’t seen ! After more than two years of nothing new, we will wait to for something more concrete before going any further, and it is for this reason that we presented the innovations briefly if somewhat skeptically.
Next, the memory architecture of the Radeon HD 2000 is much more advanced than that of the GeForce 8. There is, on the one hand, generalized cached access to video memory in reading as much as in writing (while there isn’t access of this type with other GPUs). And on the other hand, there is an independent engine to manage PCI Express transfers in parallel with the rest of the GPU. In a GeForce 8, the GPU is blocked during these transfers.
Chips from the R600 generation thus seem well armed for Stream Computing and could allow AMD to have an advantage over Nvidia. However, as AMD points out so well, the hardware is only half of the story...and for the other half we haven’t seen much yet.