Graphic processors or GPUs have evolved much in the past few years. Today, they are capable of calculating things other than pixels in video games, however, it's important to know how to use them efficiently for other tasks. If AMD has been the first to present a concrete solution to this problem, Nvidia is the first to make this solution available. We'll have a look at it in this article.
(we accept no liability for a severe headache following the reading of this article)
CPU and GPU : the differences
During the last couple of years, GPU calculation power has improved exponentially and much faster than that of the CPU. However, this doesn't mean that GPUs have evolved faster. These two components face different challenges and for this reason they have evolved in different directions.

For simplification, a CPU is expected to process a task as fast as possible whereas a GPU must be capable of processing a maximum of tasks, or to be more accurate, one task for a maximum of data in a minimum period of time. Of course, a GPU also has to be fast and a CPU must be able to process several tasks, but up to this date the development of their respective architectures has shown the above priority. This has meant multiplying processing units for GPUs, and for CPUs, making control units more complex and increasing embedded cache memory.

An enormous part of the GPU is dedicated to execution, unlike the CPUThe CPU is capable of quickly processing all sorts of tasks whereas the GPU is capable of processing very quickly a certain type of task. For the latter, this has to be in the form of a problem composed of independent elements, because of the massive parallelization of GPUs. This is, in fact, a similar problem to the one faced by CPUs, whose calculating power partly relies on vectorial units (SSE etc.). If, however, the Core 2 Duo can be seen as composed of 8 units, the GeForce 8800 has 128! This is an entirely different level and requires a different perspective when working on exploiting this computing power.
BrookGPU: the early beginnings
The idea of using a GPU as an additional calculation unit isn't new and started with the GeForce FX, the first GPUs that supported simple precision floating point calculations (FP32). A bit more than 3 years ago, when the first official publications on BrookGPU (a programming language intended to facilitate the access to GPU calculation power) were disclosed,
we wrote (translated from our French website):
"...NVIDIA and ATI's engineers still have some work to do before it's really usable. Using a current GPU for general calculation is a bit like trying to use the power provided by a potato to light a lamp. Nevertheless, as the evolution of GPUs is quite spectacular, it's not too soon to start working on this technology. Perhaps for something actually usable for the release of the NV50 and R500? We can even imagine that ATI and NVIDIA will start selling their chips on different markets…"
These predictions were quite close to reality. ATI/AMD introduced CTM, a low level API that uses the calculation core of the Radeon X1000 (R5xx) and a line of products devoted to this type of usages and NVIDIA has introduced CUDA, a programming language close to C, which exploits the core of the GeForce 8800 (the G80, the NV50’s new code name).
Is using a GPU as a calculation unit still utopian? The answer is 'no' even if we are still at an early stage for this type of use. Improvements have been numerous on the hardware and software side. In the past, we wrote several times about CTM without being able to give too many details as it wasn't public. Nvidia, however, announced a beta version of CUDA a little while ago and this gives us the occasion to take a deeper look at this language.