NVIDIA GeForce 7800 GTX - BeHardware
>> Graphics cards

Written by Damien Triolet and Marc Prieur

Published on June 22, 2005

URL: http://www.behardware.com/art/lire/574/


Page 1

Introduction



A little more than 14 months ago, NVIDIA released the GeForce 6800 Ultra, which was up until now this companyís most powerful graphic chip. 14 months without any innovation in high end products is very long time indeed even if we keep in mind that production issues and the change to PCI Express delayed the availability of this processor, and that SLI opens the door to higher performances with supported applications.

Now NVIDIA releases the GeForce 7800 GTX, previously known as the G70, the first GPU of the GeForce 7 line.

The GeForce 7800 in short
Because of the excellent performance of the GeForce 6 architecture, there was no necessity to start all over again like with the change from the GeForce FX to GeForce 6. The GeForce 7800 GTX architecture is more an evolution of the GeForce 6 rather than a revolution.

The first modification is in chip size. The number of transistors changes from 222 million with a 130 nm fabrication process for the NV40 to 302 million with a 110 nm fabrication process for the G70. TSMC is still in charge of production and despite the smaller fabrication process, the chip is slightly bigger and more expensive to produce.

The main innovation isnít in frequency increase. It only goes from 425 to 430 MHz for the GPU and 550 to 660 MHz for the GGR memory restricted to 256 MB for the moment. Despite the increase in the number of transistors, NVIDIA claims 110 Watts power consumption comparable to the previous generation.


Letís now move on to the real improvements of this chip. First off in terms of geometry, the number of vertex shading pipelines is increased from 6 to 8 (+33%). The most important progression, however, is in pixel shader processing power going from 16 to 24 pipelines (+50%). In addition, these pipelines have been modified to have a higher IPC (instructions per clock) for some uses (see pages to come for more details).

The number of ROP Engines (ROP means Raster Operation) remains unchanged. This is logical, because memory bandwidth isnít significantly improved, and they write directly in this bandwidth. We remind you that they are also responsible for anti-aliasing, blending, Z-Buffer compression and double Z (the possibility to calculate twice as much Z data than full color + Z data with the GeForce FX/6/7).

In terms of functionalities, the GeForce 7800 GTX is a Shader Model 3.0 GPU like the GeForce 6. This shader model is not yet supported by ATI and will only be available with the next chip. Like the GeForce 6, it provides almost complete support of FP16 floating point textures. Therefore, it includes a floating point buffer, but is also capable of carrying out, unlike ATIís current architectures, operations such as blending or filtering. It helps support HDR with more flexibility and efficiency. Only multi-sampling anti-aliasing isnít supported with the FP16 format.

The main innovation in terms of functionality is, in fact, transparency anti-aliasing, which is supposed to provide an efficient anti-aliasing for transparent textures at a lower cost than the full-scene supersampling anti-aliasing. Also, the PureVideo engine, which was bugged with the NV40 and didnít allow WMV9 HD video decompression acceleration, is now fully functional.

Finally, you should know that this GPU is natively PCI Express. For now, NVIDIA has not mentioned a possible AGP graphic card with an AGP <-> PCI-E HSI bridge.


Page 2
Pixel shader: theory and practice

An improved pixel shader, in theory
NVIDIA hasnít only added 8 additional pixel shading pipelines, but has also improved them. This isnít a radical change, but a small improvement to increase efficiency. Like the GeForce 6, the GeForce 7800 has two calculation units per pipeline. Each unit is in charge of several operations, but can only provide one result at a time. They can process a single simple or a combination of two simple operations but not two different ones.

The main improvement and something strongly emphasized by NVIDIA is the addition of an adder in the first unit, which in the GeForce Series 6 wasnít capable of processing additions and other related instructions, especially MADs. A MAD is a multiplication followed by an addition, for example X * Y + Z. These two operations are common in most 3D rendering algorithms (in all algorithms, in fact). Often an addition and multiplication are dependant on each other, and so it is possible to process them with a single MAD operation rather than with 2 instructions.


MAD operations are common and the capacity to process them faster is of great interest. The second unit of the GeForce 6800 is the only one to feature a multiplier / adder and able to process MADs. The GeForce 7800ís two units have this ability, and according to NVIDIA, it doubles the arithmetic power of the new GPU compared to the previous one. Things arenít that simple, however, because in practice a pixel shader doesnít use only MADs.


In reality, the maximum number of instructions processed per clock cycle in a GeForce 7800 pipeline isnít higher than that of a GeForce 6800ís, but in practice the GeForce 7800 would now be capable of getting closer to the theory.


The interpolator is a unit which interpolates color and texture addresses from the value of the three points of each triangle and can feed the pixel shading pipeline with one value per cycle. "SF" blocs are units which handle complex scalar operations such as 1/x, 2^x..
A Mini-ALU applies modifiers (simple operations such as x2, x4, x8...) to the result of each unit. The NRM unit, (vector normalisation, very useful for normal mapping) has been added by NVIDIA to accelerate this instruction, but only in FP16. Without this, an NRM instruction is divided into several ones and monopolizes several units and cycles.
An improved pixel shader, in practice
In our first tests, what we explained above turned out to be both true and false. False, because we did not observe this additional adder in action with simple additions or with MADs. We tried a multitude of variations in tests but without success. We asked NVIDIA about this and they sent us an example from which we wrote a piece of code that showed that the two MADs were indeed functional but only for very specific cases (according to the number and the type of registers used, the dependence between instructions, etc.) Bascially it means that its ability to process two MADs per cycle wonīt be very useful for games.

However, during our numerous tests, we saw the efficiency increase of pixel shading pipelines, sometimes quite significantly! So what happened? According to our first tests, efficiency increased but this comes only partially and indirectly from the improvements described above. Indirectly, because the presence of a MAD in each unit helps the compiler / scheduler to arrange the code in a better way. (For example, if a MAD is followed by a special instruction handled by the second unit, 2 cycles will be required for the 6800 to process them, instead of one for the 7800 if all the conditions are met). Partially, because the modification is, in fact, a global improvement in pipeline efficiency, the additional adder being just one of many factors that helps to increase efficiency.

With the GeForce 6800, NVIDIA introduced a flexible co-issuing management in the pixel shading pipeline (and also in vertex shading). Co-issuing is the possibility to process 2 instructions by the same unit. But how is it possible when they are supposed to process only one? Actually a complete instruction is made of four components which can represent the four channels of RGBA format (red, green, blue, alpha) or anything else. With co-issuing it is possible to process two instructions but the number of components canít exceed 4. It is then possible to process two operations on two components or one operation on three components and another one on one. ATI also supports co-issuing since the arrival of the Radeon 8500 but only in 3+1 mode. Of course, two instructions processed in co-issuing canít be dependent on one another and there are some restrictions for operations that could benefit from it.


In our GeForce 6800 test, we noticed from the start a very low efficiency in co-issuing. It was rarely used as expected and was even close to be qualified as not usable at all. We came to the conclusion that drivers were still young, especially the compiler. One year later, the GeForce 6 co-issuing has improved a lot, but has still difficulty in reaching its maximum efficiency.

With the GeForce 7800 GTX, however, the situation is much different and co-issuing efficiency is better, even exceeding our expectations in certain cases. At least this is the logical conclusion we draw from our tests. Efficiency has also improved in other minor areas which overally increases IPC in practice. This type of gain is difficult to measure as it varies from 0 % to 100 % according to the instructions suites contained in the shader. It is important for the compiler (driver), scheduler (driver / GPU), and pipeline organisation (GPU) trio to be perfectly synchronized for good improvements in efficiency. This seems to be more the case for the GeForce 7800 than the GeForce 6800.

Performance
We extracted the shaders of three applications. We used complex shaders; 3DMark05, Far Cry and Tomb Raider AOD. They were carried out on the entire screen in an external application.


The Radeon X850 XT PE has a particular affinity for the Tomb Raider shader and is even ahead of the 7800 GTX that has improved upon the 6800 Ultra thanks to the additional pipelines. With the two other shaders, however, gains are more important than the number of pipelines increase, proving their better efficiency, +6 % with the 3DMark shader and +20% for Far Cryís shader.

We then tested two lighting shaders:


Gains obtained with the 7800 GTX are more significant than the pipelines number increase. In FP32 efficiency went up by 22% with Blinn type lighting and 37% with the Phong type. Not bad!


Page 3
Branching, Vertex Shader

Branching
One of the main innovation introduced with the GeForce 6800 was dynamic branching in pixel shaders. It makes easier the writing of some shaders and increases the efficiency of other shaders by not calculating a piece of the shader for pixels that donít need it. For example, why apply a costly filter to smooth shadows edges if the pixel is in the middle of a shadow? Dynamic branching helps to detect if the pixel needs shading or not. Splinter Cell Chaos Theory uses this technique whereas Chronicles of Riddick calculates everything for all pixels. Performances drop by 10 to 15 % for the first and 50% for the second. Of course the algorithms arenít identical, but it does give an idea of what dynamic branching can do.

The current implementation isnít ideal, however, and this is only efficient in very specific cases. In a GPU, groups of hundreds or thousands of pixels are processed together. The instruction flow isnít handled per pixel but rather per groups of pixels, it means that the instruction suite has to be the same for all pixels of a given group. In other words, when there is a branching, all pixels have to take the same branch. In the other case, both branches have to be calculated for all pixels with masks to write only the result of the required branch. As branching instructions arenít free, it can be easy to reduce performances instead of improving them. Letís see now how the situation has changed from the 6800 to the 7800.

Branching instruction costs:

Case 1
6800: 3 cycles
7800: 2 cycles
6800 One year ago: 9 cycles

Case 2
6800: 5 cycles
7800: 5 cycles
6800 One year ago: 9 cycles

The difference between the first and the second case is the type of element used as output. In the first it is a constant color and for the second an interpolated color. You may have noticed the great progress in drivers in one year. Without going into the details, they can now finish the pixel shader before having to process all branching instructions.


Now we use a shader that gives us the possibility to specify the branching granularity (the number of consecutive pixels that take the same branch). If the block size correspond to the one handled by the GPU (or are bigger), branching will improve performances. As you may have noticed, with the 7800 branching improves performance for smaller blocs of pixels, meaning that itís more efficient. When asked about this improvement, David Kirk, NVIDIAís Chief Scientist, answered that this gain comes from a better distribution of blocs of pixels between the different quad engines. It means that quad engines are now more independent and can process blocks taking different branches. It wasnít the case for the GeForce Series 6, at least in this case, as one big block was used by all quad engines. Because of that, a GeForce 6600 GT could be more efficient than a 6800 GT (while still providing lower performances in most cases).

Without this optimisation, the GeForce 7800 GTX would have been even less efficient than the 6800 GT because of the higher number of quad engines. It would have had to process this type of branching in groups of 6000 pixels instead of groups of 1000. This optimisation is of course welcomed, but it is important to remind you that 1000 pixels is still quite large, and that this will have to be strongly reduced in the future for branching to be really efficient for a maximum number of cases.

Vertex Shader
Like the Pixel Shader, NVIDIA announced improvements in vertex shaders. This time it isnít question of something added in the calculation units but simply of improvements in efficiency. We donít have any additional information, because NVIDIA didnít go into detail. We tested the T&L, VS 1.1, VS 2.O and VS 2.X/3.0 performances in RightMark :


In Diffuse with one light source, performances in the four modes increased slightly more than the improvement from 6 to 8 vertex shading pipelines. In Diffuse & Specular with three light sources, performances increased a little bit more. Indeed there is an improvement of vertex shader efficiency (+7%).


Page 4
HDR, Textures, Up/Down

HDR
HDR as seen in Far Cry or Splinter Cell and which should be more widely used in the future, relies on 64 bit rendering, or 4 FP16 components to be more accurate. The scene, and some other elements, are rendered in a 64 bit buffer (RT, Render Target), and these parts are then processed (considered as textures) to assemble the final result. For all of this to work in a simple way, the GPU must support FP16 filtering and blending. This is the case of the GeForce 6 and 7 series, but not with the Radeon X800. Accessing FP16 textures might be a bit costly, but it is important to note that filtering isnít. It requires transistors to be implemented in the GPU, but doesnít reduce performance in practice. FP16 blending consumes a lot of memory bandwidth, 2 times as much as normal rendering.


We measured performances with several combinations of textures, RT and with or without blending:


In 32 bits with blending, the GeForce 7800 GTXís performance increased by 20% whereas the bandwidth increase is less than 10%. Blending seems therefore more efficient, probably because it helps to do it less often when itís required. With the 64 bit RT, performances are very close, and here the limitation is clearly due to memory bandwidth. Once 64 bits textures are also used, the 7800 performances strongly increase. Is it due to the additional pipelines? This is what we try to determine with the next test.

Texture fetch
How do the different graphic cards perform with different textre formats and sizes?


With standard 32 bit textures, graphic cards perform similarily and fillrate is progressively reduced. The two NVIDIA graphic cards are slightly ahead of the Radeon and can access 4096x4096 textures, whereas the Radeon is restricted to 2048x2048. In practice this isnít a bothersome limitation as no textures of this size are used.


In FP16, results are much different. The two NVIDIA graphic cards filter the texture, but this has a negligible effect on overall performances. If the Radeon fillrate is progressively reduced, the fillrate of the two GeForce stops at 512x512 textures and that is probably where the texture cache efficiency limit is. With smaller textures, the GeForce 7800 uses the additional pixel shading pipelines to be 50% more efficient than the 6800. With huge textures, the GeForce 7800 GTX is 2.7 times more efficient than the previous model. It shows that the texture cache has been improved and is more comfortable under these conditions. The Radeon remains two times more efficient than the best GeForce with 2048x2048 textures.


In FP32 results are close to those of FP16ís, except that the 7800ís additional pipelines are no longer useful because of the memory bandwidth limitation. The Radeon is once more time in first position with 2048x2048 textures and the GeForce 7800 is a lot more efficient than the 6800.

Texture upload and download

These tests were made with Serious Magic (D3D) for download from the graphic card and Texbench (OGL) for upload to the graphic card. When the graphic card data is recovered, the two GeForces are 30% faster than the Radeon. This is quite surprising as the 6800 uses a PCI Express / AGP bridge and we would assume it would be slightly behind.

For download, the situation is the opposite with the Radeon dominating. The 6800 Ultra is slightly behind because of the bridge used.


Here we have fillrate results in Texbench with a scene rendered with different quantities of textures from 160 to 320 MB. The 6800 Ultra 512 MB is in the lead thanks to its extra memory, which gives the possibility to store more textures locally. As usual with this test, ATI graphic cards are slightly behind for an unknown reason. The 7800 GTX uses the native support of PCI Express to clearly beat the GeForce 6800 Ultra with 240+ MB of textures.


Page 5
Filtering and anti-aliasing

Anisotropic filtering
In the list of innovations introduced by NVIDIA, the manufacturer says the 7800 features a more efficient anisotropic filtering. First important point, we have to keep in mind that because of its architecture, the GeForce 7800 GTXīs performance cost will be partially hidden with the activation of a complex filter because of a higher number of pipelines than ROPs.

But this isnít all. NVIDIA has once again modified anisotropic filtering. Itís hard to tell what the differences are, but clearly something is new. Unfortunately, it sometimes leads to a noticeable reduction in quality in movement. We can clearly see a shimmering effect on some parts of textures, more or less obvious according to the texture level of detail, orientation, or even its level (the first is less impacted when multi-texturing). This shimmering is less noticeable with the GeForce 6800.


This utility helps to show color differences between pixels,
here level 2 textures with the 6800 and 7800. We can see that filters arenít identical


It is impossible to show you the quality loss with a simple screenshot as it is only visible in movement. The only thing noticeable is that the texture seems less filtered and more accentuated in some areas, which causes the problem. Unfortuneatly we noticed this type of artifact on a new high end graphic card and we hope that NVIDIA will correct it.

Anti-aliasing
For multi-sampling anti-aliasing, NVIDIA hasnít changed the basic type. Once again we find the sample position introduced with the GeForce 6, the Rotated Grid type. Our only regret here is the absence of a 6x mode supported by ATI.


NVIDIA has, however, introduced an innovating function called transparency anti aliasing. The MSAA (multi sampling anti aliasing) problem is that it doesnít filter the interior of a texture. So when an object such as a grid or plant is represented by a texture with transparent parts, we clearly see the apparition of aliasing. A simple MSAA can do nothing about this. Only the super sampling anti aliasing helps to by-pass the problem, but this type of filter is extremely costly. It just increases the scene resolution (2048*1536 for 1024*768 for 4x !) and then applies a resizing filter.

This is where transparency anti aliasing comes in. NVIDIA hasnít revealed all the functional details of this technology, but apparently in its super sampling mode the idea is to intelligently apply a super sampling to textures needing it, whereas the rest of the scene is still processed in multi sampling. There is also a transparency anti aliasing mode based on multi sampling, but the way it works is still unknown.

In practice, the multi sampling mode doesnít really give satisfying results. If we look carefully at the monitor with a magnifying glass we see a slight improvement in some areas but in practice the difference is invisible. This is not a bad thing as it has nearly no effect on performance.


Without AA Transparency, then with AA Transparency MS, and finally with AA Transparency SS

The super sampling mode is very efficient, because a grid like the one here in Half Life 2 doesnít have artifacts anymore. The performance impact is quite variable, varying from non existent for a scene without any texture to filter in this mode to being very significant when many are present. So in the Half life scene where there are many grids the frame rate drops from 104.1 to 68.2 fps. Quality has a cost and as itís optional each user should decide if itīs worth the cost. We feel that this is an interesting initiative from NVIDIA.

Sans transparency MS Transparency SS Transparency

This technology isnít entirely finished, however, and we noticed a few bugs. In Half Life 2 it seems that some textures that use transparency are problematic. The shader in charge of lighting isnít properly processed and so these textures appear white. Another problem in Far Cry, depending on the angle when looking at a plant, sometimes they are filtered and suddenly they are not. A slight move reveals this problem:



Page 6
Graphic card, power consumption, noise

The graphic card, power consumption, noise
For the GeForce 7800 GTX test, NVIDIA sent us two 256 MB graphic cards. This gave us the possibility to provide results with and without SLI. The graphic card is rather long, slightly bigger than a 6800 Ultra (21.7cm) and equivalent to a 6800 Ultra 512 MB (23 cm). The PCB is in fact quite similar to the 6800 Ultra 512 MB and only one memory space out of two is occupied by the Samsung 1.6ns chip, proof that a 512 MB version is possible.



The cooling system is similar to the GeForce 6800 GTís, with an identical fan and an improved heatsink, and is still single slot. In practice the 7800 GTX is notably less noisy than a GeForce 6800 Ultra and even a 6800GT in 2D (in this mode the GPU is clocked at 275 MHz) and in 3D. The noise level is very reasonable and wonít disturb silent computers enthusiasts.


6800U, puis 6800U 512, puis 7800GTXIn terms of functioning temperature, according to NVIDIAís monitoring after 10 minutes of 3DMark05 Game Test 4, the 7800 GTX reached 75įC, instead of 69įC for the GeForce 6800 Ultra. This chip, therefore, reaches quite elevated temperatures and it is probably due to the less noisy fan. We are still however far from NVIDIA security level of 115 įC. The GPU Overclocking seems quite easy because we easily reached 490 MHz instead of the initial 430 MHz. It looks like NVIDIA leaves some margin of progression for an Ultra version...

Of course, we evaluated the various graphic cardsī power consumption. These measurments were taken directly from the electrical plug, which therefore corresponds to the computerīs total power consumption (with an Enermax 535W). The figures reported were obtained in Windows desktop and in load with 3DMark05 in GameTest4 and Prime95. Prime95 helps to obtain constant processor use regardless of the graphic solutionīs performance.


In 2D, the GeForce 7800 GTXīs power consumption is lower than its is with the 6800, but still not at ATIís level. In 3D, the X850 XT PE is the more power hungry, whereas the 7800 GTXīs power consumption is slightly higher than the 6800 Ultra. This difference, however, is quite moderate.

Speaking of power consumption, NVIDIA indicates that a configuration with a 6800 Ultra requires a 400 Watt power supply, instead of a 350 Watt with a GeForce 7800 GTX. The manufacturer also specifies that for both cards a 500 Watts power supply is required for SLI. We donít understand here why NVIDIA recommends 50 Watts less with a single 7800 GTX graphic card configuration and 150 additional watts are required for another 7800 GTX. With one of the two graphic cards, we advise you to use a good 350 Watt power supply and a good 450 W for SLI.


Page 7
The test, CPU Limited ?

A8N SLI Premium
For this test, we used the latest ASUSTeK nForce4 motherboard, the A8N-SLI Premium. It features two notable improvements compared to the A8N-SLI Deluxe. First, the noisy fan was replaced by a passive cooling system based on heat pipes, the AI Cool-Pipe, with a heatsink on top of the chipset connected to another heatsink via heat pipes located near to the CPU. The second heatsink is cooled down by the CPU cooler air flow. Of course in practice this solution is silent and the chipset doesnít overheat even if we must say that our CPU cooler used in this test (a XP 120 with a 120 mm fan) is well suited for this kind of cooling system.


Another advantage is the replacement of the SO-DIMM port and accompanying board, used to enable non-SLI (x16 port and x1 port) to SLI configuration (two x8 ports) via a manual manipulation. The modification is now automatic according to the type of card inserted in the two PCI-Express x16 ports. Perciom switchs are used for this purpose. Would the A8N-SLI Premium be the perfect nForce4 SLI motherboard?

The test
For this test we have used a very high end computer to truly take advantage of the 7800 GTX and of crouse of the 7800 GTX SLI:

- ASUSTeK A8N-SLI Premium (bios 1005)
- AMD Athlon 64 2.8 GHz (soon to be introduced)
- 2x512 MB PC3200 memory 2-2-2
- Enermax 550W power supply
- 2 XFX GeForce 6800 Ultra PCI-E 256 MB / ForceWare 77.62
- 2 XFX GeForce 6800 Ultra PCI-E 512 MB / ForceWare 77.62
- 2 GeForce 7800 GTX PCI-E 256 MB / ForceWare 77.62
- ATI Radeon X850 XT / Catalyst 5.6
- Raptor SATA hard drive
- DVD ROM LG player

For this test we measured performance in 1280*1024, 1600*1200, and 1920*1200 (or 1920*1440 when this mode wasnít available, like in Act Of War), with different graphic settings: standard, 4x antialiasing and 8x anisotropic filtering, and HDR if available. The significance of testing this type of graphic card in 1024*768 is minimal, so we preferred to increase the resolution to a higher one, which should please wide screen users.

CPU Limited ?
We will probably hear very soon that the GeForce 7800 GTX requires a powerful processor to be correctly exploited. Without being completely inaccurate, these remarks are more a crude generalisation as the graphic engines, the scenes they manage and the available graphic options lead to thousands of different loads.

Indeed we have to keep in mind that the overall rendering speed of a 3D scene comes from graphic processor speed in rendering the scene, but also the processor speed to calculate and send it.

So if scene A can be calculated at 60 frames/s by the processor but rendered at 90 frames/s by the graphic card under certain graphic settings, the final result will be 60 frames/s. This scenario is called ę CPU limited Ľ, and means that the graphic card loses time in waiting for the processor. In this situation itīs good to increase the graphic settings (resolution, anti aliasing, anisotropic filtering, amongst others) to reduce the rendering speed to a closer level with the CPU. We will then no longer be "CPU limited" and it will be possible to have a higher graphic quality without a loss in framerate.

If you want to play at 90 frames/s, a graphic card change or reduction in parameter wonít help. You will have to lower a sceneīs geometric or physic engine complexity, the animation or change the processor. The inverse situation is also possible, ę GPU limited Ľ. The best way to overcome this limitation is to reduce the graphic settings such as resolution.

The most important thing to know if a graphic card change is worth it, is to isolate your computer "Achillesí heel". In a game if at some point the framerate isnít sufficient, deactivate the most power hungry graphic options such as anti aliasing or anisotropic filtering and then reduce the resolution until you reach a decent framerate. If despite these changes the framerate doesnít increase or itīs not enough, this is because the weakness is on the processor level (or the processor speed, which is also RAM speed).

The exception to rule is for limitations that come from a graphic cardīs geometric power. But this only happens if you have a bottom of the line graphic card from two or three years ago. If images become blocked quite regularly, the problem may be a lack of central memory leading to a disc swap. You have to find out if disc access and the cuts in images are related (with your ear or eye via the HDD LED).

Apart from these exceptions, and the case mentioned above, changing the graphic card wonít bring any improvement. You will only have the same problem with a nicer image and we have seen better changes. In the case when reduction in graphic settings doesnīt improve fluidity, the graphic card is the origin of the problem and an upgrade is the answer.

If itīs not a great choice to couple an entry level processor with a high end graphic solution, perfect associations do not exist, because of the number of possible combinations. The best thing is to make an intelligent choice based upon the available information.

If we had to generalize, we would simply say that for current games, and even with high graphic settings, a graphic card of this type would not be pushed to its limits in 1280*1024 with an average processor. Generally, it is with resolutions such as 1600*1200 or 1920*1200 that gains are the most appreciable, even if this necessitates a case by case evaluation.


Page 8
Half Life 2 & Doom 3

Half Life 2

Provided that the 7800 GTX is in a comfortable environment, itīs able to express its full potential in Half Life 2 even without AA 4x or Aniso 8x. Compared to the 6800 Ultra gains increased logically with the resolution at +33.8% in 1280*1024, +53.6% in 1920*1200, and +46.7% in 1600*1200. Performances were only 17% higher than the X850 XT in the best case. Whether itīs with these adjustments or with AA 4x + Aniso 8x, we can see that the 6800 Ultra SLI is approximately at the same level as the 7800GTX.

With the final adjustments ATI reached the same level of performance as the 7800GTX in 1920*1200. Of course the 7800 GTX SLI can do even better, but we wouldnít expect any less from this product.

Doom 3

There is no surprise that Doom 3 (here in Ultra mode, with anisotropic filtering 8x) runs much faster with the GeForce 6800 than the Radeon X850. The release of the 7800 GTX helps NVIDIA to go even further with up to 53.7% higher performances. The 6800 Ultra SLI is still superior to a single 7800 GTX because of important performance gains due to this type of technology in Doom3. Results are equivalent after FSAA activation, but it allows NVIDIA to reduce the difference in performance.


Page 9
Far Cry & Splinter Cell 3

Far Cry

We begin with Far Cry, Crytekís FPS, whose originality is to support a wide range of graphic functionalities, such as Pixel Shader 2b and 3, or even High Dynamic Range, even if it is quite disproportionate in some areas.


Far Cry HDR requires the appropriate equipment



In terms of performance with standard adjustments the 7800 GTX results are 29.5% and 14% higher than the 6800 Ultra and X850 XT. In SLI mode, NVIDIA graphic cards are simply restricted by the CPU īs transfer rate per second even if itīs the best on the market. It would be foolish not to increase graphic options!

In AA4x and Aniso 8X, the 7800 GTX performances are up to 36.7% higher than its predecessor. With the resolution and adjustment improvements we noticed the X850 XT PEīs excellent performance, which comes closer to the new NVIDIA chips. Here we only see that the 7800 GTX SLI gives the possibility to play in the same resolution as a 7800 GTX with equivalent framerate but with the AA 4x and Aniso 8x activated (even the 1920*1200 4x/8x starts to be a bit of a problem for the 7800 GTX SLI).

In HDR, the 7800 GTX is up to 44% higher than the 6800 Ultra. The SLI doesnít bring any performance gains, so it is best to have one 7800 GTX rather than two 6800 Ultras. HDR activation reduced performances less for the 7800 than for the 6800, proof that some optimisations were made.

Splinter Cell 3
The last opus of the Splinter Cell series has a Shader Model 1.1 and Shader Model 3.0 mode in its graphic options. In this latter mode itīs possible to play with the same graphic quality as in 1.1 with a slight performance gain (5% approximately), and also to activate additional options such as HDR, Parrallax mapping, Tone mapping and Soft shadows. We initially used the SM 1.1 mode for ATI and the SM 3 ę basic Ľ mode for NVIDIA followed by SM3 with full options. For an undetermined reason the game was bugged with a GeForce 6800 Ultra 512 MB with a single card but not with two. A driver problem?


Initially, when comparing the 7800 GTX to the 6800 Ultra the gains varied from 40 to 50%. Here is an ideal case accentuated in SLI as very good performance gains were obtained with the second graphic card. With AA 4x et Aniso 8x, the performance difference of the 6800U to the 7800GTX is up to 39% and in HDR itīs even 55% in 1920*1200. This isnít a common resolution, but it will please 24Ē LCD monitor users. They will have to purchase an SLI configuration to activate all options in games like this one. Like with Far Cry, HDR activation reduced the 7800īs performances less than the 6800īs.


Page 10
Pacific Fighters & Act Of War

Pacific Fighters

For this test, we configured Pacific Fighters to use the Pixel Shader 2.0. Graphic configuration options were set to their maximum and we modified the game configuration file to indicate Mode 2 for the water pixel shader.


Here, the GeForce 7800 GTX performance difference with the 6800 Ultra is 28 to 44% without AA/Aniso and is 50% higher with these settings. The 7800 GTX is, for example, as fast in 1920*1200 4x/8x as the 6800 Ultra in 1280*1024 4x/8x. This is non negligible even if overall performances arenít sufficient.

In fact, in this mode only the 7800 GTX gives us the possibility to have a minimum of game comfort in 1280*1024.

For the SLI, NVIDIA hasnít solved all its problems. Indeed if NVIDIA changed the bug which appeared in AFR mode via the last driver, the result is a regular lack of fluidity which makes the game unplayable. To resolve this problem you have to use the SFR mode. Create a new profile in SFR to have satisfactory play in this game.

Act Of War
Act Of War is a new real time strategy game which is quite attractive. Like all RTSs, performances are mainly restricted by the processor in scenes which include heavy combat between many units. Therefore, we chose a lighter scene to see how the graphic cards would function with this graphic engine in minimal conditions. It is, however, important to keep this point in mind.


In SLI, the game exhausted the 6800 Ultra 256 MB from 1600*1200 4x/8x in AFR mode. This mode which helps to increase fps uses more memory than the SFR mode. When going over a large building on the map in this test, there was a consequent reduction in speed, which can be avoided in SFR mode. The 6800 Ultra 512 MB like the 7800 GTX 256 MB doesnít have this problem because of inborn PCI Express management. For once in this review of graphic cards the 6800 Ultra 512 MB provided a more than noticeable performance gain in AA 4x / Aniso 8x and with a single graphic card.


Page 11
Colin Mc Rae 05

Colin Mc Rae 05

NVIDIA graphic cards have never been as comfortable with Colin Mc Rae as ATIís. To confirm this point take a look at the 6800 Ultra and X850 XT scores. There were, however, strong performance gains in the change from 6800 to 7800, from +52% to +75%. The most significant improvements were measured without AA / Aniso, and we may assume that cache texture optimisations are at the origin of these results. However, as performances were so low this improvement is far from creating a difference as compared to the X850 XT PE. AA and Aniso activation reduces the gap between these two last solutions, which are now very close. Only the SLI helps NIVIDA to make a good impression in this test, but at what price?


Page 12
Windows Media 9 decompression

Windows Media 9 decompression
When introducing the GeForce 6800 in April 2004, NVIDIA announced taht this chip was able to decode in hardware MPEG1, MPEG2, MPEG4 and WMV9 video, as well as to encode in hardware all MPEG formats.

These functionalities were unavailable at the release. We had to wait until the end of 2004 to find out what was behind the PureVideo name and until the end of May 2005 for Microsoft to release the DXVA patch that enables acceleration for Windows Media videos decoding. For the moment thereīs still no question of encoding acceleration and even worse, the GeForce 6800 GT & Ultra have a bugged PureVideo engine which forbids WMV9 decoding acceleration. This functionality is then restricted to graphic cards such as the GeForce 6200 or 6600.

With the GeForce 7800 GTX, NVIDIA finally has a high end solution capable of accelerating WMV9 videos decoding. We verified this by measuring the processor use rate during the Step Into Liquid video read in 1080p version with an Athlon 64 processor clocked at 2.8 GHz:


On average, with the Radeon X850 XT PE the CPU usage is 39.5% instead of 41.4% with the GeForce 7800 GTX and 68,6% with GeForce 6800 Ultra. There is no doubt that the WMV9 acceleration is now functionnal.

It is interesting to note that there is some talk of future H.264 decoding acceleration. The H.264, is a part of the MPEG-4 norm (the other denomination is MPEG-4 Part 10), and goes even further for compression at the expense of encoding and decoding complexity. A semi hardware solution by the graphic processor would be welcomed.


Page 13
Conclusion

Conclusion
With the GeForce 7800 GTX, NVIDIA has released a GPU that is the most powerful on the market. With several small improvements on the GeForce 6 architecture the company successfully increased overall efficiency. Also, despite the additional gains and pipelines, the graphic card is less noisy than a GeForce 6800 Ultra with equivalent power consumption.

The icing on the cake is that the first GeForce 7800 GTX will be available within a week for an MSRP of US$599. It is too expensive for most users, but it is important to keep in mind that several 6800 Ultras and X850 XTs are still on the market at this price and the release of the GeForce 7800 GTX will lead to a price reduction of these solutions. This is good news for those who donít need as much power or who donít want/canít spend too much for a graphic card while waiting for the less expensive version of the 7800.

For games where there are no limitations due to the CPU, performance gains vary from 30 to 50% compared to the 6800 Ultra. It gives us the possibility to play either with a higher screen resolution or with the same one but with more graphic options (AA 4x + Aniso 8x, or graphic options such as the HDR). Pixel pipeline improvements are very promising for future performance gains when graphic cardīs performance limitations will be more shifted from memory to GPU rendering power.

Compared to two GeForce 6800 Ultras in SLI and depending on the conditions, a simple GeForce 7800 GTX may be slower, equivalent or faster. Itīs hard to make a general conclusion in this comparison. We have to keep in mind the downsides of SLI : its overall cost, power consumption and heat dissipation. Also SLI doesnít necessarily work for all games (or the inverse) even if this is less and less true and that profile creation is now possible in NVIDIA drivers to by-pass this problem.

Of course, the combination of two GeForce 7800 GTXs allows us to go even further in performances and to support 1920*1200 in AA 4x and Aniso 8x in most cases without any problems. The fortunate owners of 24" LCDs will like this possibility.

This bright new portrait canít, however, hide the downside found with the new anisotropic filtering which sometimes reduced graphic quality. This shouldnít happen with a GPU of this calibre and the performance war should respect some limits.

What about the competition? According to the latest rumours, ATI will launch a new graphic processor this summer. Much is still unknown about it (performance, availability, etc). While waiting for these answers, the GeForce 7800 GTX is on its way to stores. Will the most impatient buyers be right like they have been in the past? Only the future will tell!


Copyright © 1997-2014 BeHardware. All rights reserved.