Home  |  News  |  Reviews  | About Search :  HardWare.fr 



  Processors

  Motherboards

  Graphics Cards

  Multimedia

  Storage

  Imaging

  Monitors

  Miscellaneous
Advertise on BeHardware.com
Review index:
The Radeon X1900 XTX, X1900 XT and X1900 CrossFire in tests
by Damien Triolet
Published on January 24, 2006

Branching
One of the main innovations introduced with the GeForce 6800 was dynamic branching in shaders. This facilitates shader writing and increases the efficiency of other shaders by avoiding the calculation on pixels which don’t need it. For example, why apply a very performance costly filter to soften the border of a shadow if the pixel is in the middle of a shadow? Dynamic branching helps to determine if the pixel needs it or not. Splinter Cell Chaos Theory uses this technique, whereas the Chronicles of Riddick calculates everything for every pixel. Performances drop by 10 to 15% for the first and more than 50% for the second. Of course, the algorithms aren’t identical, but it does give us an idea of what dynamic branching is capable of.


This only applies to very specific cases. Branching has the reputation of being difficult to manage. It is particularly the case in CPUs that have to predict the branching result to mask calculation latency. In a GPU, pixels are processed by groups of 100s or even 1000s. For branching, all pixels have to take the same branch or else both branches have to be calculated for all pixels with masks to only write the result of the required branch.

On paper, ATI has a clear advantage with its processing unit devoted to branching and very small threads. Let’s see if this is the case in practice with a small test that we developed allowing us to change branching granularity (the number of consecutive pixels that take the same branch). We specify the branch to take per pixel column. One column out of 2 has to display a complex shader and the other can skip this part of rendering. Average sized triangles in motion are displayed on the monitor and cross the areas that use different branches. The triangle size, their position and the column size have an influence on branching efficiency. This is close to real situations.


With narrow columns, GPUs can’t use the branching to avoid the complex part for half of the pixels, but they have to process branching instructions. It reduces performances instead of increasing them. At least for NVIDIA, ATI has a special unit for branching that works in parallel with pixel shading and texturing pipelines. This hides branching instruction costs.

ATI’s small threads of 16 pixels (4x4) allow performance improvements as soon as the column width reaches 4 pixels, whereas you have to wait until 64 bits for NVIDIA! The Radeon X1900 only starts to benefit from branching starting at a column size of 8 pixels (granularity of 8x? pixels). This is due to the fact that the Radeon X1900, because of the higher number of pixel shading pipelines, has to work on 48 pixels threads instead of 16, making them less efficient. The threads have a "corner" shape that represent 3 squares of 16 pixels. A bug gives incoherent results for the 16 pixel column with beta drivers used for this test even though this problem didnīt exist before with drivers used for previous tests. Consequently, we have removed them.

ATI clearly has superior gains compared to NVIDIAīs architecture, whether itīs with the Radeon X1800 or Radeon X1900. We draw your attention to the fact that they are gains and not raw performances. So even if they benefit less from branching, the Radeon X1900 is almost three times faster than the X1800 for this pixel shader. The 7800 GTX which basically has a significant calculation power is in front of the X1800 and sometimes twice as fast.

Overall, ATI has better branching efficiency than NVIDIA and this should permit its use in more situations, which developers should appreciate. If this enables the X1800 to fill the calculation performance gap with the 7800, the X1900 takes a clear advantage for this type of shader.

Next, we conducted a second test related to dynamic branching. This time we rendered a Mandelbrot fractal (remember that test that was used at the time of the GeForce FX) first normally and then with branching. This algorithm uses a high number of identical iterations, which are found next to each other in the standard shader. With the branching based shader, we used a loop around 2 iterations with a test that checks if the additional iterations are useful or not. If they arenīt, we exit the loop.


Because of the complex shader, in basic version, the X1800 doesnīt provide excellent results. The Radeon X1900 corrects this and finishes in front of the 7800, which isnīt really surprising. Results obtained with the shader that uses branching are interesting, because they confirm that ATI canīt really benefit from it. This isnīt really the case for NVIDIA, however. With this test, the fact that the GeForce 7800 doesnīt have a special calculation unit for branching implies that it will lose an enormous amount of cycles to process it. In addition, it only derives some benefit from low granularity and leads to a drop in performance. This test shows that an optimised shader for Radeon can also reduce performances for GeForce, which means the developersī choice could play an important role on Radeon performances when compared to the GeForceīs. Will they use branching? Standard methods ? Both ? Today, itīs hard to answer this question.


Vertex Shader
We obtained results tested in T&L, VS 1.1, VS 2.0 and VS 2.X/3.0 in RightMark :


Since our previous test, the X1800īs vertex shading performances have increased. Because there was no modification in this area, the X1900īs results are similar. There was a bug that prevents the execution of one of the tests for the Radeon. With static branching, NVIDIA suffers, but this isnīt new and probably is the result of a driver bug that NVIDIA isnīt in a hurry to correct.

Unlike NVIDIA, ATI missed Vertex Texturing support, which was, we thought, required to display Vertex Shader 3.0 support. ATI pretends it isnīt required, which is strange. If we take a closer look, we noticed that ATI reports the vertex texturing support to DirectX but doesn’t authorize any texture format. This is suspicious and looks like a skilful way to round the DirectX specifications and claim Vertex Shader 3.0 support without Vertex Texturing capability. This isnīt very clear, but either way in practice, Vertex Texturing isnīt very important and despite 2-3 technological demonstrations, isnīt used because itīs too restricted (at least in its current implementation).


Textures accesses
The Radeon X1900 is identical to the Radeon X1800 on this level, so we haven’t made any additional tests. You can consult previous results in the X1800 test.

X1900 texturing units, however, have acquired a new capacity called fetch4, which was already present in the Radeon X1300 and X1600. During texture filtering, the texturing unit recovers 4 texels and mixes them via a linear interpolation. Some rendering like shadows need another filtering (PCF, for example). In this case, you have to access the texture four times in point sampling and use a pixel shader to filter the 4 texels in the right way. NVIDIA accelerates PCF filtering in hardware and doesnīt have to make this operation. The Radeon isnīt capable of doing this.

Fetch4 partly resolves this problem. In the case a 1 channel texture (those used in shadow rendering, for example), fetch4 will fill the 4 channels usually employed by the three colors and the transparency with the four texels which will have to be returned to the pixel shader as such, meaning unfiltered. The shader will have to deal with filtering, however, and only one texture access will be required instead of four. 3DMark06 uses this capability with the Radeon X1300/X1600 and X1900 and developers will certainly use it in the future.

<< Previous page
Specifications, Pixel shaders

Page index
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18
Next page >>
CrossFire, cards, consumption, tests  




Copyright Đ 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.