Home  |  News  |  Reviews  | About Search :  HardWare.fr 

MiscellaneousStorageGraphics CardsMotherboardsProcessors
Advertise on BeHardware.com
Review index:
NVIDIA GeForce 7800 GTX
by Damien Triolet et Marc Prieur
Published on June 22, 2005

An improved pixel shader, in theory
NVIDIA hasn’t only added 8 additional pixel shading pipelines, but has also improved them. This isn’t a radical change, but a small improvement to increase efficiency. Like the GeForce 6, the GeForce 7800 has two calculation units per pipeline. Each unit is in charge of several operations, but can only provide one result at a time. They can process a single simple or a combination of two simple operations but not two different ones.

The main improvement and something strongly emphasized by NVIDIA is the addition of an adder in the first unit, which in the GeForce Series 6 wasn’t capable of processing additions and other related instructions, especially MADs. A MAD is a multiplication followed by an addition, for example X * Y + Z. These two operations are common in most 3D rendering algorithms (in all algorithms, in fact). Often an addition and multiplication are dependant on each other, and so it is possible to process them with a single MAD operation rather than with 2 instructions.

MAD operations are common and the capacity to process them faster is of great interest. The second unit of the GeForce 6800 is the only one to feature a multiplier / adder and able to process MADs. The GeForce 7800’s two units have this ability, and according to NVIDIA, it doubles the arithmetic power of the new GPU compared to the previous one. Things aren’t that simple, however, because in practice a pixel shader doesn’t use only MADs.

In reality, the maximum number of instructions processed per clock cycle in a GeForce 7800 pipeline isn’t higher than that of a GeForce 6800’s, but in practice the GeForce 7800 would now be capable of getting closer to the theory.

The interpolator is a unit which interpolates color and texture addresses from the value of the three points of each triangle and can feed the pixel shading pipeline with one value per cycle. "SF" blocs are units which handle complex scalar operations such as 1/x, 2^x..
A Mini-ALU applies modifiers (simple operations such as x2, x4, x8...) to the result of each unit. The NRM unit, (vector normalisation, very useful for normal mapping) has been added by NVIDIA to accelerate this instruction, but only in FP16. Without this, an NRM instruction is divided into several ones and monopolizes several units and cycles.
An improved pixel shader, in practice
In our first tests, what we explained above turned out to be both true and false. False, because we did not observe this additional adder in action with simple additions or with MADs. We tried a multitude of variations in tests but without success. We asked NVIDIA about this and they sent us an example from which we wrote a piece of code that showed that the two MADs were indeed functional but only for very specific cases (according to the number and the type of registers used, the dependence between instructions, etc.) Bascially it means that its ability to process two MADs per cycle won´t be very useful for games.

However, during our numerous tests, we saw the efficiency increase of pixel shading pipelines, sometimes quite significantly! So what happened? According to our first tests, efficiency increased but this comes only partially and indirectly from the improvements described above. Indirectly, because the presence of a MAD in each unit helps the compiler / scheduler to arrange the code in a better way. (For example, if a MAD is followed by a special instruction handled by the second unit, 2 cycles will be required for the 6800 to process them, instead of one for the 7800 if all the conditions are met). Partially, because the modification is, in fact, a global improvement in pipeline efficiency, the additional adder being just one of many factors that helps to increase efficiency.

With the GeForce 6800, NVIDIA introduced a flexible co-issuing management in the pixel shading pipeline (and also in vertex shading). Co-issuing is the possibility to process 2 instructions by the same unit. But how is it possible when they are supposed to process only one? Actually a complete instruction is made of four components which can represent the four channels of RGBA format (red, green, blue, alpha) or anything else. With co-issuing it is possible to process two instructions but the number of components can’t exceed 4. It is then possible to process two operations on two components or one operation on three components and another one on one. ATI also supports co-issuing since the arrival of the Radeon 8500 but only in 3+1 mode. Of course, two instructions processed in co-issuing can’t be dependent on one another and there are some restrictions for operations that could benefit from it.

In our GeForce 6800 test, we noticed from the start a very low efficiency in co-issuing. It was rarely used as expected and was even close to be qualified as not usable at all. We came to the conclusion that drivers were still young, especially the compiler. One year later, the GeForce 6 co-issuing has improved a lot, but has still difficulty in reaching its maximum efficiency.

With the GeForce 7800 GTX, however, the situation is much different and co-issuing efficiency is better, even exceeding our expectations in certain cases. At least this is the logical conclusion we draw from our tests. Efficiency has also improved in other minor areas which overally increases IPC in practice. This type of gain is difficult to measure as it varies from 0 % to 100 % according to the instructions suites contained in the shader. It is important for the compiler (driver), scheduler (driver / GPU), and pipeline organisation (GPU) trio to be perfectly synchronized for good improvements in efficiency. This seems to be more the case for the GeForce 7800 than the GeForce 6800.

We extracted the shaders of three applications. We used complex shaders; 3DMark05, Far Cry and Tomb Raider AOD. They were carried out on the entire screen in an external application.

The Radeon X850 XT PE has a particular affinity for the Tomb Raider shader and is even ahead of the 7800 GTX that has improved upon the 6800 Ultra thanks to the additional pipelines. With the two other shaders, however, gains are more important than the number of pipelines increase, proving their better efficiency, +6 % with the 3DMark shader and +20% for Far Cry’s shader.

We then tested two lighting shaders:

Gains obtained with the 7800 GTX are more significant than the pipelines number increase. In FP32 efficiency went up by 22% with Blinn type lighting and 37% with the Phong type. Not bad!

<< Previous page

Page index
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13
Next page >>
Branching, Vertex Shader  

Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.