Home  |  News  |  Reviews  | About Search :  HardWare.fr 



MiscellaneousStorageGraphics CardsMotherboardsProcessors
Advertise on BeHardware.com
Review index:
Nvidia CUDA : practical uses
by Damien Triolet
Published on August 17, 2007

CUDA evolves
It’s now clear that CUDA wasn’t simply an Nvidia marketing ploy and/or to see if the market is ready for such a thing. Rather it’s a long term strategy based on the feeling that an accelerator market is starting to form and is expected to grow quickly in the coming years.

CUDA’s team is therefore working hard to evolve the language, improve the compiler, make use more flexible, etc. Since the 0.8 beta version out in February, the versions 0.9beta and finally 1.0 have allowed CUDA to viably make the use of GPUs as coprocessors. More flexibility and robustness were necessary, although the version 0.8 was already very promising. These regular evolutions also allowed increasing the feeling of confidence, from which CUDA is starting to benefit.

Two main evolutions stand out. The first is the asynchronous functioning of CUDA. As we explained in our previous article, the version 0.8 suffered from a large limitation, because once the CPU sent the work to the GPU, it was blocked until it sent the results back. The CPU and GPU therefore couldn’t work at the same time and was a big brake on performances. Another problem was found in the case where a calculation system was equipped with several GPUs. A CPU core per GPU was needed, which isn’t too efficient in practice.

Nvidia of course knew this and the synchronous functioning of the first CUDA versions were probably used to facilitate a rapid release of a functional version without focusing on the more delicate details. With CUDA 0.9 and also 1.0, this problem disappeared and the CPU is free once it has sent the program to be executed to the GPU (except when access to textures is used). In the case where a number GPUs are used, it is however necessary to create a CPU thread per GPU because CUDA does not authorize the piloting of two 2 GPUs starting from the same thread. This is not a big problem. Note that there is a function that can force synchronous functioning if this is necessary.

The second main innovation on the functional level is the appearance of atomic functions, which means reading data in memory, using it in an operation and writing the result without any other access to this memory space until the operation is fully completed. This allows avoiding (or at least reducing) certain current problems such as a thread which tries to read a value which we don’t know if it was modified or not.


Finally, with CUDA 1.0, Nvidia distributes PTX (Parallel Thread Execution) documentation, which is an intermediary assembler language between high level code and that which is sent to the GPU. PTX was already used and developers could access it, however, it wasn’t documented yet. This is probably because the behavior of the different compilation levels was not yet clearly defined. PTX could be used to optimize certain algorithms or libraries or quite simply to debug the code.

<< Previous page
Taking advantage of the GeForce 8

Page index
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Next page >>
The Tesla line  




Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.