ATI Radeon X1800 XT & XL - BeHardware
>> Graphic cards
Written by Damien Triolet and Marc Prieur
Published on October 5, 2005
URL: http://www.behardware.com/art/lire/592/
Page 1
Architecture in brief
A little more than three years ago, ATI released the Radeon 9700 Pro. The Canadian manufacturer surprised everyone and was the first to release a DirectX 9 chip. While NVIDIA was struggling to finalise the 130 nm GeForce FX, ATI used only a 150 nm fabrication process.
History seems to repeat itself, but this time the roles are reversed. If NVIDIA played it safe and chose a 110 nm process for the GeForce 7800 GTX (released three and a half months ago), ATI decided to be the first to use a 90 nm process for the R520.
In the end, the R520 required two additional revisions compared to usual product releases, which resulted in a four month delay. Now that this trouble period is behind them, ATI is ready to release the Radeon X1800.
New architecture, a quick look The new architecture for the 321 million transistors as introduced by ATI includes several optimisations. The memory controller has been reviewed to have a higher frequency and better use of available bandwidth. Memory is no longer accessed at 64 bits (4x64) at a time but at 32, and a memory ring bus has been developed.
 This bus helps to simplify memory routing within the chip and increases memory frequency. Data read in memory is directly sent to the units requesting it via this bus and without going through the memory controller.
Cache was also modified as it no longer uses a Direct Mapped or N-Way Set Associative mapping type but a rather Fully Associative one. This increases the hit ratio (corresponds to a % of possibility for data requested by the CPU to be in the cache) at the expense of searching speed to know if a data is (or isn’t) in the cache. In theory, this last point isn’t a problem for ATI because of Ultra Threading, which we´ll now discuss.
As you probably guessed, ATI draws a parallel with Intel’s HyperThreading, which was introduced in the Pentium 4. The objective is the same: to best use the different pixel shader calculation units. If a unit is at rest and waiting for data to complete a calculation, it´s possible to put this calculation temporarily aside to process others.
In terms of functionalities, ATI catches up with NVIDIA. The accuracy of pixel shader calculations, previously “restricted” to 24 bits, is now increased to 32 bits similar to NVIDIA ever since the GeForce FX. Also, ATI now supports Pixel Shader version 3 and dynamic branching. They say that the impact on performance is lower than that of competing architectures.
High Dynamic Range is now also included as the Radeon X1800 is capable of processing blending for FP16 textures. This texture format using floating point calculation numbers is used in games such as Far Cry, Splinter Cell or Age Of Empire 3 for more realistic lighting. And what´s more is that if the GeForce doesn’t support anti aliasing when HDR is activated, the Radeon X1800 exceeds this limitation.
 The number of calculation units is 16 (like the X800) for pixel shading and texturing. The vertex shading engine has been strengthened from 6 to 8 units. 3DMark05 will appreciate this improvement.
Page 2
Reviewed and corrected pixel shadersReviewed and corrected pixel shaders ATI was undeniably late compared to NVIDIA’s GPU pixel shading capacities. This is only logical as ATI’s previous shader core, even if it was slightly improved with the X800, is more than three years old.
The 9700/X800 shader core functioned in a relatively fixed process and this restricted its capacities. When a thread (a group of pixel to be processed) arrived in the shader core, it first went through the texturing bloc and all texturing instructions were processed and stored in the registers. Then they went though the pixel shader pipeline for all the instructions to be executed with texturing results already in memory. The thread management was pipelined, which means that as soon as a thread leaves the texturing bloc, another arrives while a second one waits. This system is interesting in terms performance, and is partly responsible for the Radeon 9700/X800´s string efficiency as this implies that the restricting factor is either texture access or mathematical instructions. For NVIDIA, however, it´s the sum of these two factors. This is also the reason why ATI is more efficient for anisotropic filtering as it´s easier to hide the time requested by the system with mathematical instructions. NVIDIA has to optimise the order of instructions to do this, but it´s never as efficient.
It becomes more complicated when an indirection is included in the shader. An indirection is an access to a texture, whose coordinate has been dynamically calculated in the pixel shader. This data is initially unknown and is now relatively common. Given that texture access is done before, it´s impossible to directly access the texture, and the thread has to be put on hold and sent back to the texturing bloc as soon as possible. As other threads are already ready to go to the shader core, the number of threads in it increases. For example, if there are 2 effective threads (we say effective because the actual number might be higher, a multiple of this number) in real time (one in the texturing bloc and one in the pixel shading bloc), it will increase to 4 with one indirection, 6 with two indirections and 8 with three indirections. 8 was the maximum supported and was a strong limitation.
Previously, we often spoke of the limitation of the number of registers for NVIDIA as there were really only 2 FP32 registers per pixel in the GeForce FX and 4 for the GeForce 6 and 7. For ATI however, the 12 registers requested by Pixel Shader 2.0 were actually present and not “emulated” from a smaller number. A lesser known fact is that these 12 registers were only accessible with two effective threads. If 4 were used, they had to share the register space. In the end, there were only 6 registers per pixel and 3 registers per pixel with 8 effective threads.
NVIDIA’s solution : the long pipeline NVIDIA’s method in avoiding limitations in terms of indirections and for more flexibility with branching etc. was the use of a very long 256 cycle pipeline, much of which was useless except in waiting for texturing unit results. With NVIDIA there isn´t two distinct parts, the entire process is fused. As soon as a texturing instruction arrives, it´s directly processed and the result is available shortly thereafter in the pipeline, however, in the same effective cycle. NVIDIA shouldn´t use several threads simultaneously if it wants to stay efficient but it can access an unlimited number of textures dependant on the results of the pixel shader. The downside is the impossibility of hiding texturing latency as well as latency beyond a certain limit and the necessity to work with large threads of 1024 pixels (or even more with the GeForce 6).
This approach raises several problems with dynamic branching because in the GPU, the instruction flow is managed per thread or per group of pixels and not per pixel. In other words, each pixel in a thread has to go through the same path and has the same instructions applied. In the case when a branching result isn’t identical for all pixels in a thread, the two branches have to be processed for all of them. It´s no longer possible to use dynamic branching to increase performances (for example avoiding the rendering of large part of the shader) or even the inverse can happen.
ATI’s solution: Ultra Threading With the Radeon X1000, ATI had to make some modifications and have pixel shaders without indirection limits and capable of branching processing. The solution chosen wasn’t to follow NVIDIA, but rather to further the 9700/X800´s concept by increasing the number of threads. It´s now increased to a maximum of 512 in the Radeon X1800, which is much higher than before, even if we don’t know the exact number.
 The thread size is very small at 16 pixels and is much different than NVIDIA´s 1024. The Radeon X1000 supports 32 real registers per pixel, but this number drops depending on the number of threads in activity. We didn’t obtain the maximum number of threads with which the 32 registers were available, but we estimate it to be 64 or 1024 pixels. This represents 32,768 128 bit general registers as compared to 24,576 for the GeForce 7800, which is less flexible. Pixels also never have more than 4.
The Ultra Threading principle is quite simple even if its implications are very complex. As soon as a thread arrives in one of the four shader cores (which all have a bloc of 4 texturing units and four pixel shading units), the process starts and mathematical instructions are executed until an operation causing latency arises (such as a texture access). When this happens, the thread is sent to the texturing bloc, its results staying in the temporary registers and a new thread goes to the shader core. As soon as it arrives to the texturing instruction, it goes to the adapted bloc and a new thread goes in, until the texturing result of the first thread is known. At this time, it goes back to the pixel shading bloc for the instruction suite to be applied, until a new operation which causes latency arrives. The cycle continues until the shader is completely processed. After that, the thread goes out of the shader core and the process starts over again.
In other words instead of hiding latency with a long pipeline as NVIDIA does, or with a fixed architecture as before, ATI uses a high number of threads of which a significant part remains dormant while awaiting the result of texturing units. This method combines the advantages of both architectures.
For dynamic branching, the fact that ATI uses very small threads, avoids the calculation of the two branches for each pixel more often than NVIDIA. This could lead to a very significant advantage in the future. Still at this level, ATI has, in addition to the pixel shading bloc and texturing bloc, a third bloc in parallel which deals with branching instructions. So this doesn’t really have an impact on performance whereas it requires several cycles for NVIDIA.
ATI hasn’t strongly improved its architecture on the calculation unit level as they remain more or less identical to the Radeon 9700/X800. There is the one large vec3 + 1 unit with a small vec3 + 1 unit, which process simple operations like modifiers. NVIDIA’s architecture includes two large and two small units and it´s important to note that the large ones can’t process all instructions and the order has to correspond to their capacities to use them simultaneously. NVIDIA is also capable of processing operations in vec2 + vec2, even if in practice the compiler has some difficulty in this domain. Finally, NVIDIA has native NRM instruction (normalisation) support in FP16, whereas ATI has no units in FP16 and uses the instruction decomposed version, which requires several cycles.
 Compared to the 9700/X800, ATI has still made several small improvements, amongst others native support of sincos instructions. Overall however, NVIDIA keeps an advantage in calculation power. ATI defends itself by claiming that this architecture maximises the use of calculation units and compensates.
In terms of pixel shaders, the X1000 architecture has an innovating function, called “scater”. This allows the saving of any value directly to the graphic card´s memory. This is a huge evolution compared to restricted access of the memories of other GPUs made possible thanks to the new quite flexible memory architecture. Roughly, this function allows an unlimited number of registers and provides an enormous amount of new possibilities with GPU use, such as general calculation units in GPGPU. This function is nevertheless very advanced for its time and can’t be used with DirectX 9. ATI has decided, however, (a first in the GPU industry) to publish low level information on the GPU X1000 in 2006. GPGPU developers will then be able to access the chip without using an API and utilize its full potential.
Page 3
Perf in pixel shading, Branching, Vertex ShaderPerformance in pixel shading We extracted complex shaders of three applications: 3DMark05, Far Cry and Tomb Raider AOD. We carried them out on the entire monitor in an external application.
 In 2 out of the 3 pixel shaders, NVIDIA dominates with the 7800 GTX thanks to a higher calculation power. ATI take the lead for the third, which relies more on dependant texture accesses (indirection) and benefits from the bigger X1800 XT bandwidth.
Compared to the X850 XT PE, the X1800 XL brings very small performance gains of 5 to 10%. This isn’t so surprising as it has a smaller memory bandwidth and calculation power. The gains are then really due to the new architecture, ring bus and Ultra Threading. Of course, the X1800 XT provides higher performances especially with the third shader as the memory bandwidth and higher frequency combine to increase performances by 70% compared to the X1800 XL.
We then tested 2 lighting shaders:
 These shaders measure pure calculation power and are clearly to NVIDIA´s advantage, who also benefits from FP16 to increase performances. The difference between the X1800 XT and X850 XT PE is mainly due to frequency.
Branching One of the main innovations introduced with the GeForce 6800 was dynamic branching in shaders. It facilitates some shader writing and increases the efficiency of other shaders by avoiding the calculation on pixels which don’t need it. For example, why apply a very performance costly filter to soften the border of a shadow if the pixel is in the middle of a shadow? Dynamic branching helps to determine if the pixel needs it or not. Splinter Cell Chaos Theory uses this technique, whereas the Chronicles of Riddick calculates everything for every pixel. Performances drop by 10 to 15% for the first and more than 50% for the second. Of course, the algorithms aren’t identical, but it does give us an idea of what dynamic branching is capable of.
 Of course, this only applies to very specific cases. In a GPU, pixels are processed by groups of 100 or even 1000. For a branching, all pixels have to take the same branch or else two branches have to be calculated for all pixels with masks to only write the result of the required branch. On paper, ATI has a clear advantage with its processing unit devoted to branching and very small threads. Let’s see if this is the case in practice with a small test that we developed allowing us to change branching granularity (the number of consecutive pixels that take the same branch). We specify the branch to take per pixel column. One column out of two has to display a complex shader and the other can skip this part of rendering. Average sized triangles in motion are displayed on the monitor. The triangle size, their position and the column size have an influence on branching efficiency. This is then closer to real situations than our previous test, which was made with two triangles in full screen.
 With narrow columns, GPUs can’t use the branching to avoid the complex part for half of the pixels, but they have to process branching instructions. It reduces performances instead of increasing them. You will notice that this performance reduction is only 2.5% for ATI as compared to 9-10% for NVIDIA. This is due to the fact that ATI has a special unit for branching, which works in parallel with other units.
ATI’s small threads of 16 pixels (4x4) allow performance improvements as soon as the column width reaches 4 pixels, whereas you have to wait until 64 bits for NVIDIA! ATI easily reached a 60% performance gain whereas NVIDIA remains at 20% except for the 800 pixels column (a monitor divided in two because we are working in 1600x1200).
Here the GeForce 7800 isn’t more efficient than the GeForce 6800. It was in our previous tests, but it was for very specific cases where the 7800 did indeed have higher performances. In practice, this isn’t the case, however. Gains are slightly smaller with the 7800 as the architecture is more efficient and consequently the cost for a complex branch is lower.
Overall, ATI has better branching efficiency than NVIDIA and this should permit its use in more situations. Developers will appreciate this.
Vertex Shader We tested performances in T&L, VS 1.1, VS 2.0 and VS 2.X/3.0 in RightMark :
 For simple rendering, with a single light source, NVIDIA dominates except in T&L. For more complex rendering, the X1800 XT comes out ahead in all tests. With static branching, NVIDA has problems and the previous Radeon generations aren’t much better. The X1800 XT doubles performances. With dynamic branching, unlike pixel shaders, NVIDIA’s GPU and the X1800 XT have similar behaviours and the performance gap is only due to the difference in frequency.
Unlike NVIDIA, ATI seems to have forgotten about Vertex Texturing support which we thought was required for Vertex Shader 3.0 support. ATI says the opposite and this is quite odd. If we take a closer look, we see that ATI reports vertex texturing to DirectX, but doesn’t authorize it for any texture formats. This looks suspicious and may be a clever way to avoid DirectX specifications and announce Vertex Shader 3.0 support without the use of Vertex Texturing. Either way, it´s unclear and anyway, in practice, Vertex Texturing isn’t really important except for 2-3 technological demonstrations. It´s not very widespread, because it´s too restricted at least in its current implementation.
Page 4
Perf HDR, TexturingHDR performances HDR as seen in Far Cry or Splinter Cell, and which should be more widely used in the future, relies on 64 bit rendering, or 4 FP16 components to be more accurate. The scene, and some other elements, are rendered in a 64 bit buffer (RT, Render Target), and these parts are then processed (considered as textures) to assemble the final result. For all of this to work in a simple way, the GPU must support FP16 filtering and blending. This is the case of the GeForce 6 and 7 and Radeon X1000, but not with the Radeon X800. Accessing FP16 textures might be a bit costly, but it´s important to note that filtering isn’t. It requires transistors to be implemented in the GPU, but doesn’t reduce performance in practice. FP16 blending requires a great deal of memory bandwidth, twice as much as normal rendering.
There is no FP16 filtering for ATI and unlike NVIDIA, they passed this by. Developers will have to integrate filtering algorithms to a pixel shader and this will have an impact on performances although it will be reduced compared to the overall cost of HDR. ATI explains its position by the fact that in FP16, developers generally don’t want box filters (bilinear filtering, etc), but prefer better adapted ones (the Unreal 3 engine uses a specific filter with all cards, for example). If there was a demand they could include a FP16 filter support in capabilities (caps) and automatically include the filter in the shader so that it could be apparent to developers. For the moment, however, there is no real demand.
HDR implies using tone mapping, which consists in changing the HDR image into a 32 bit displayable one. The tone mapping algorithm defines how HDR data is to be interpreted and the final result. This is made in an additional path after the scene rendering. With the Radeon X1000, Avivo technology is capable of automatically processing tone mapping. Unfortunately, it´s impossible to access Avivo’s API via DirectX and ATI will avoid this problem with FourCC texture format which will be exposed in DirectX. When used, ATI’s driver automatically applies tone mapping via Avivo’s unit. NVIDIA also considered a specific unit to tone mapping in the 6800, but it didn’t work well, was deactivated and has completely disappeared with the 7800.
We measured performances with textures and surface rendered in 32 bits and in 64 bits (FP16) and without blending:
 In 32 bits, NVIDIA is more efficient and more easily reaches maximum performances. The Radeon has abnormally low performances.
In 64 bits, the 6800 Ultra is clearly behind, mainly because of poor results in 64 bit texture access. It was improved with the 7800, which now features performances close to the Radeon X850 XT PE and X1800 XL. The Radeon X1800 XT does better, thanks to memory clocked at 750 MHz, which is the main factor in this test.
What about games? It is hard to tell how the X1800 XT will behave in practice as they didn´t work in HDR FP16 in our tests. The current version of Far Cry only seems to want to activate HDR if it detects a NVIDIA card and Splinter Cell uses a much simpler HDR, similar to that used with the X850. In the Serious Sam 2 demonstration, HDR activation leads to several bugs. Implementation isn’t identical for both manufacturers and although differences are minor, each game will have to arrange HDR depending on whether it´s ATI or NVIDIA.
Texture access How do the different cards behave with different formats and texture sizes (simple access to textures without filtering)?
 With standard 32 bit textures, the 2 GeForces provide similar performances. The Radeon X850 XT PE is clearly behind and doesn’t support 4096x4096. The X1800 XT has better results and is 50% more efficient than other graphic cards with 2x2 textures. This type of texture isn’t really used in practice, but it shows how the architecture´s efficiency has improved. With 4096x4096, it´s 70% more efficient than NVIDIA’s cards. Its memory clocked at 750 MHZ is once again advantageous.
 With 64 bit textures (FP16), the 6800 Ultra has problems, which were corrected with the 7800 GTX, which now provides now much higher performances. It isn’t enough to compete with the X1800 XT, which is in the lead when textures size increases thanks to its memory architecture, optimised cache and higher bandwidth. Performance is three times higher and the Radeon X850 XT PE is double with 2048x2048 textures. The 6800 Ultra is far behind and is eight times less efficient than the new ATI graphic card, which is also the only one to support 4096x4096 textures in FP16.
 With 128 bit textures (FP32), the situation is identical. Even if the gap between the cards is smaller with small sized textures, it grows with the texture size. The X1800 XT is almost five times more efficient than the 7800 GTX and 20 times more than the 6800 Ultra. ATI’s sub memory system clearly seems to be an advantage.
Page 5
Filtering and anti aliasingFiltering
The X850´s default filtering algorithm is unchanged with the X1800. Still, ATI introduced a new “high quality” filter in the drivers. If before the filtering level changed a lot depending on the viewing angle, this time with this option activated, filtering is almost optimal from any angle. Textures change to lower definition mipmap and are of better quality. We can see that with the following animated GIF where mipmap changes are coloured:
 This is a very good thing even if we have to keep in mind that the cost to performance isn’t negligible. With the flight simulator, Pacific Fighters, which is particularly sensible to filtering speed, we reach 51.1 fps in 1920*1200, and fall to 39.2 fps with 8x anisotropic filtering and 30.7 fps with the same “quality mode”. The possibility to choose its activation is advantageous, because for certain cases the impact on performance can be significant.
With a newer game, which uses many shaders, the filtering cost has a reduced influence thanks to ATI’s architecture separating the texturing block from the pixel shading bloc. With Splinter Cell CT, we reached 76.4 fps in 1600x1200, 75.6 fps with 8x aniso activated and 75.4 with the “high quality” mode! In other words, the impact on performance is negligible and we thank ATI for improving filtering quality, something that had not happened for a very long time.
Anti aliasing
For anti aliasing, ATI recently made available a new hidden option called Adaptive antialiasing for the whole DirectX 9 line. Of course, this includes the X1800. Similar to the Transparency AA introduced by NVIDIA with the GeForce 7800 GTX, it consists of using a supersample anti-aliasing type instead of multi-sample for surfaces with alpha tests, which aren’t filtered when we simply use multi-sampling. Here are a few screenshots to see the quality differences:
      ATI without AA, ATI AA4x, ATI AA4x + AAA, Nvidia without AA, Nvidia AA4x, Nvidia AA4x + TAA It works for both manufacturers. Results aren’t identical for ATI and NVIDIA as solid parts of objects like grids seems to be thicker for NVIDIA. It is hard to know which is better from this example alone.
 Looking at the table you can see that antialiasing cost is smaller with X1800 than it is with the X850, thus allowing ATI to increase its advantage over Nvidia with this mode.
The cost to performance due to this activation of what we will call “Transparency Supersampling” is variable according to the type of scene. Under Colin Mc Rae, compared to standard anti aliasing, we see that it is relatively significant, because of the presence of numerous types of foliage:
 Costly for NVIDIA in AA 2x, the TS is however more so for ATI in 4x. Figures don’t show any specific optimisation for this function with the X1800 compared to the X850.
Page 6
AVIVO, the cards, testsAVIVO The video engine AVIVO has also been improved by ATI with the new architecture. In addition to the usual MPEG-2 and more recent Windows Media Video decoding, H.264 is now decoded by ATI’s GPU. An integral part of the MPEG-4 norm (another name is MPEG-4 Part 10), the H.264 pushes compression even further at the expense of encoding complexity and decoding.
If there is no reader yet able to use this acceleration, according to an ATI demonstration of a H.264 read with a 25 Mbits /s transfer rate, percentage of use of a Pentium 4 3.6 GHz would be 90 to 95% without acceleration… and 33% with it! We have to put this figures in perspective, however. Such a transfer rate wouldn’t be really used in practice as 11 GB would be required for 1 hour of video.
It´s important to point out that H.264 isn’t available for now, but WMV9 acceleration should be. Unfortunately, it isn’t the case in practice as X1800 drivers are bugged. If WMV9 HD 720p video reading isn’t a problem, the same video in 1080p makes nice art:

X1800 XL & X1800 XT For the X1800´s release, ATI launches 3 cards:
- Radeon X1800 XT 512 MB, $549 - Radeon X1800 XT 256 MB, $499 - Radeon X1800 XL 256 MB, $449
They are expensive, but differences between each version are quite reduced. Therefore, the X1800 XL is not really that interesting if we take a closer look at the card´s frequency:
You don’t have to think twice to change from 500/500 to 625/750 for $50...except that the X1800 XL will be available in the near future. For the X1800 XT, we will have to wait for ATI to have more R520 revisions, which should be November 5.
 
  In practice, the X1800 XT and X1800 XL are distinguishable, because of their cooling system. If the X1800 XT uses the same cooling system as the X850 XT PE, the XL’s is much more discreet…visually! Indeed from the sound point of view in 3D we have to say that the X1800 XL is very noisy. It´s a high pitched sound. In this respect, the X1800 XT is better, but still is quite noticeable. The 7800 GTX is much better for noise, unlike the 7800 GT which is similar to the X1800 XL.
The power consumption of these graphic cards was evaluated with measurements taken directly at the power outlet. This represents the computer’s entire power consumption, here an Enermax 535W. Figures were obtained under Window’s desktop and in use with a 3D scene and Prime95. Prime95 makes it possible to have constant CPU use regardless of a graphic card’s performance.
 In 2D, ATI’s cards are generally less power hungry than NVIDIA’s and the X1800 XT is by far the lowest power consumer. In use, the X1800 XL remains more economical than the X850 XT PE whereas the X1800 XT confirms its leadership with 32 Watts more than the 7800 GTX!
The test For this test, we used ASUSTeK´s A8N32-SLI Deluxe, which had two nForce4 chips to have two real 16x PCI Express slots. If this solution looks interesting, we have to say that compared to a “simple” A8N-SLI with two 8x slots, performance gains are non-existent (as we said in the Crossfire article). We used the following configuration:
- ASUSTeK A8N32-SLI Deluxe - AMD Athlon 64 FX-57 - 2x1024 MB of Corsair PC3500 LL Pro memory - Enermax 550W power supply - 1x Radeon X1800 XT 512 MB - 1x Radeon X1800 XL 256 MB - 1x Radeon X850 XT PE 256 MB - 1x Radeon X800 XL 256 MB - 2x GeForce 7800 GTX 256 MB - 1x GeForce 7800 GT - 1x GeForce 6800 Ultra - 1x GeForce 6800 GT - Raptor SATA hard drive - DVD ROM LG player
For these tests, ATI didn’t send us a X1800 XT 256 MB but rather a 512 MB. It doesn’t represent a significant advantage yet except in very high resolution in a few games like F.E.A.R (to be released soon) or even Act Of War, which we used in our tests.
We measured performances in 1280*1024, 1600*1200, and 1920*1200 (or 1920*1440 when this mode was unavailable as in Act Of War), with different graphic settings: standard, 4x anti aliasing and 8x anisotropic filtering and HDR if available. Testing this type of high end graphic cards in low resolution isn’t really of interest. We preferred to not include low resolution and chose higher ones, which should please users of bigger monitors, which are supposed to go with these graphic cards.
Catalyst Control Center, no change Released more than a year ago, the very much criticized Catalyst Control Center is still far from being perfect. Its inefficiency and infamous quickness shown between each slider movement are still as bothersome. If after several revisions it became a little less slow, we still can´t say it´s fast. Bugs are also still present. For example, the FSAA Catalyst Control Center parameter sometimes went out of control and deactivates itself but without changing the parameter on the indicated mode. Once this problem starts, modifications are only effective with an odd numbers of changes. If the FSAA is in 4x mode and decides to deactivate itself (even if it stays in 4x), you have to change to 6x (6x real) then 2x (deactivated instead) and finally back to 4x mode to have it function again…until the next cut. It’s useless to tell you how annoying this is in running several tests. Could ATI please change this?

Page 7
Half Life 2, Doom 3Half Life 2
  Without anti aliasing and anisotropic filtering, performances in the Half Life 2 scenes used here are “restricted” by the CPU and even at high resolutions. In this mode, the X1800 XL provides slightly lower performances than the X1800 XT, which is at the same level as the 7800 GTX. In AA 4x and Aniso 8x, ATI’s cards are in the lead, because even if the X1800 XL does not provide better results than the X850 XT PE, it is still comparable to the 7800 GTX. The X1800 XT is in the lead and is even close to the 7800 GTX SLI.
Doom 3   NVIDIA still dominates for Doom 3, even if we noticed a strong performance progression without AA with the X1800 XT compared to the previous generation. We initially thought that it came from this card´s massive bandwidth, but oddly this improvement fades away once AA is activated. Even worse, under these circumstances, the X1800 XT is only 10% faster than the X850 XT PE whereas the X1800 XL is noticeably slower. Because of the significant performance reduction due to AA activation, especially with the X1800 XT, we have to believe that there are still some possible optimisations via the drivers.
Page 8
Far Cry, Splinter Cell CTFar Cry
   We now move on to Far Cry, which recognises the X1800 as an NVIDIA graphic card because it supports Shader Model 3.0. Despite this, the HDR is unfortunately not functional yet. We will have to wait for the next patch used by ATI for demonstrations, which will allow activating HDR and antialiasing simultaneously. We remind you that it isn’t possible with the GeForce 7800.
Without AA and Aniso, the 7800 GTX and X1800 XT provide equivalent performances. Once these effects are activated, however, ATI’s chip takes the lead. The X1800 XL is then at the same level as the 7800 GTX. We have to specify that this was also the case for the X850 XT PE.
Splinter Cell Chaos Theory    Splinter cell detects the presence of Shader Model 3.0 with ATI’s card and opens the possibility to activate all effects. Since the 1.04 patch, it is possible to activate effects such as Soft Shadows, Parallax Mapping, and HDR with Pixel Shader 2.0 cards, whereas this was previously restricted to PS 3.0 cards. We noticed that HDR is far from being comparable to SM 3.0 as it is calculated in whole numbers and not in floating points.
In practice, it is unfortunately this version of HDR which is activated with the Radeon X1800 XT for an unknown reason. Results in this mode aren’t directly comparable to NVIDIA’s graphic cards, so we only compared ATI’s cards to each other. The X1800 XL is on the same level as a X850 XT PE and the X1800 XT is still in the lead with a 33% advance.
Without HDR, X1800 XL and X850 XT PE are still very close. The X1800 XT is still in the lead with a comparable advance. If it provides slightly higher performances than the 7800 GTX without AA / Aniso, it takes the lead once these effects are activated with up to 30% higher performances.
Page 9
Colin Mc Rae 05, Act Of War, Pacific FightersColin Mc Rae 2005
  If NVIDIA caught up in games such as Colin Mc Rae 05 with the 7800, ATI´s new architecture doesn’t improve anything. The X1800 XL is noticeably slower than the X850 XT PE without AA / Aniso. Thanks to its high frequencies, the X1800 XT takes the lead ahead of NVIDIA’s high end products. Once AA and Aniso is activated, the gap between the XT PE and X1800 XL is reduced thanks to the less performance costly AA with the X1800. The X1800 XT also benefits from this improvement and provides much higher performances than the 7800 GTX with a difference of up to a 20%.
Act Of War   The X1800 XL finally provides better results than the X850 XT PE. However, we see that without AA and Aniso, the X1800 XT is much closer to the 7800 GT than the 7800 GTX, proof of NVIDIA’s domination in this test. Activating AA and Aniso changes this and the X1800 XT takes the lead in 1920*1200. We have to put this result in perspective as it is probably due to the 512 MB memory. Performances achieved in 1600*1200 are close to the ones obtained with the 7800 GTX.
Pacific Fighters   We are at the end of the game tests with the flight simulator, Pacific Fighters. Once again results are very similar between the X1800 XL and X850 XT PE, whereas the X1800 XT is very close to the 7800 GT without AA / Aniso. Once these effects are activated, NVIDIA takes the lead as the 7800 GTX is 33% faster than the X1800 XL.
Page 10
ConclusionConclusion
ATI’s Radeon X1800 architecture is very promising. Indeed, the Radeon creator made some in-depth modifications of the two most important elements of the GPU, the memory controller via the ring bus and pixel shader processing core with ultra-threading. We also have to keep in mind the addition of capabilities which ATI´s GPU lacked up until now such as the Shader Model 3, HDR via FP16 Textures and also the possibility to combine this HDR with antialiasing.
However, conclusions from these tests are quite ambivalent. Indeed, if these optimisations and advances are attractive on paper, and even in practice with synthetic tests, the first results in games aren’t necessarily up to our expectations in terms of performance and functions (HDR isn’t functional for the many games which support it).
For performances, there is the Radeon X1800 XL, which will be the first version available, and which often provides similar performances to the X850 XT PE. Of course, this is already quite good considering that the XT PE is often equal to a 7800 GT, but we were expecting something a little more and many may be disappointed despite additional functions.
The X1800 XT and impressive memory frequency doesn´t have this problem and is often clearly quicker than a GeForce 7800 GTX, especially with anti aliasing, which is less performance costly with ATI. There are of course two exceptions, the OpenGL games Doom 3 and Pacific Fighters.
It should be said that ATI has admitted not touching these drivers for the moment. In fact, all the drivers put in place with this new ATI architecture are far from being optimised unlike those for the Radeon X800 and GeForce 7800, which have been around for 3 years and 18 months, respectively. It should therefore be interesting to watch the evolution of Radeon X1800 performances, whether it involves 3D or video (H.264 decoding, functional WMV 1080p …).
If you are planning on buying a X1800, we suggest you wait a month for the X1800 XT instead of going immediately for the X1800 XL, whose performances are significantly less. Compared to NVIDIA products, if the choice between a 7800 GT and X1800 XL seems difficult, the X1800 XT compared to the 7800 GTX seems a little easier. Of course we our basing our evaluations on the prices and dates announced by ATI, $449 for the X1800 XL right now and $499 for the X1800 XT in a month. Let´s just see if they keep their promises...
Copyright © 1997-2009 BeHardware. All rights reserved.
|