Home  |  News  |  Reviews  | About Search :  HardWare.fr 



  Processors

  Motherboards

  Graphic Cards

  Multimedia

  Storage

  Imaging

  Monitors

  Miscellaneous
Advertise on BeHardware.com
Review index:
Product review: The Nvidia GeForce GTX 280 & 260
by Damien Triolet
Published on July 7, 2008

SIMT vs. SIMD vs. MIMD
With the GeForce 8, Nvidia introduced architecture that made a complete break from the past. Thus, exit the enormous MIMD vector units out of which it was sometimes difficult to extract the maximum. Instead the choice was in favor of scalar units. While on the implementation level this involved 256 bit wide (8 x 32 bits) SIMD units (like SSE), on the functional level it's not 8 32bit operations instruction that is applied to 1 thread/element per cycle but rather one 32 bit operation on 8 threads/elements. For this reason, in practice and from the outside these units behave as scalar units.

To highlight the difference with SIMD (Single Instruction Multiple Data), Nvidia speaks of SIMT (Single Instruction Multiple Threads). And while units are similar, the SIMT enables to naturally maximize the use of units if the task is massively parallel as it happens to be in 3D rendering. The interest of SIMT is that the programmer doesn’t have to do anything in order for this to be the case, while in SIMD the programmer and compiler have to strive to fill the vector unit which isn’t always that easy. MIMD (Multiple Instructions Multiple Data) as it is used by AMD in the Radeon HD 2000/3000 suffers from the same problem even if it is more flexible.

Of course, the SIMT isn't the ultimate solution because there are always compromises. It is more efficient but also uses more transistors, surface on the chip and power consumption higher because a more complex logic control is required. On the other hand, the SIMD and MIMD enable placing more math units in the GPU though this is to the detriment of efficiency.

The GeForce 8 was thus created with only 128 scalar units while the Radeon HD 3870 has 64 vec5 units or the equivalent of 320 scalar units. The higher efficiency of SIMT of course is not sufficient to compensate for this difference. On the other hand, Nvidia has managed to implement double pumped-type math units, or in other words, those that run at double speed compared to the scheduler. This largely compensates; however, to the expense of branching efficiency (which we will address in its own specific section).
GeForce 8/9 and GTX 200 architecture
The GeForce 8 and 9 are equipped with a certain number of blocks of processing units or partitions. Each of these partitions have 2 multi-processors and one texturing units block. The multiprocessor is composed of a scheduler, a 8192 32bit register space, a SIMT unit composed of 8 FMAD-type scalar processors (floating point multiplication and addition in a single cycle) and 2 units (SFU) dedicated to special functions such as sin, cos, log, etc. (which are therefore 4x slower than simple instructions). These two units can also function like an extra SIMT processor composed of 8 FMUL type scalar processors.

The two groups of units (FMAD and SFU/FMUL) can work in parallel because the GeForce 8/9 have dual issue support. On the other hand, there seems to be some limitations and it is not always easy to use FMAD and FMUL units at the same time. In fact, it is difficult for the compiler and scheduler to use them simultaneously because independent instructions would be needed as well as simultaneous access to all the required registers. For this reason, most of the time, it’s the FMAD unit which handles FMUL instructions and only special functions are executed in dual issue. Given the above we should add that this has evolved with new drivers.

On the texturing units level, in the beginning there were 4 address units and 8 filtering units per partition therefore capable of processing 4 texels in bilinear, trilinear or 2x anisotropic filtering. However, after the G80, Nvidia slightly updated its chips by adding 4 more address units. For this reason, the partitions of succeeding GeForce 8 and 9s are capable of processing 8 texels in bilinear filtering or 4 texels in trilinear or 2x anisotropic filtering.

<< Previous page
Introduction

Page index
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16
Next page >>
Architecture: the GeForce GTX 200  




Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.