Testing 12 Athlon 64s - BeHardware
>> Processors

Written by Marc Prieur

Published on November 12, 2004

URL: http://www.behardware.com/art/lire/531/


Page 1

Athlon 64



Socket 754 ? 939 ? 130nm and 90nm fabrication process? This test is for those who are lost in AMD’s line of products. Here is a comparative test of 16 processors including 12 Athlon 64s. The P-Rating is sometimes just the tip of the iceberg.

The Athlon 64-review
Before getting to the heart of the matter, a little bit of background on the Athlon 64 is necessary. Released in September, 2003, the Athlon 64 processors are based on AMD’s K8 architecture. The K7 architecture was introduced along with the first Athlon Slot A in 1999.


The K8 architecture is essentially based on K7 processing units. Still, AMD also brought several innovations to improve the overall performance of this new core:

- 130 nm SOI fabrication process
- Increased pipeline size
- Integration of a memory controller
- Integration of a HyperTransport controller
- Improvement of the cache L2
- Addition of SSE2 instructions
- Extension of the x86 to 64 bits

In practice, one of the most important changes was the integration of the memory controller, which until then was controlled by the chipset. This integration strongly reduced memory latency and increased overall performance. Other improvements were also significant and it’s important to take into account that one is still unactivated-the extension of the x86 set of instructions to 64 bits, only compatible with the “alternative” OS. The release of Windows XP x64 has been postponed from the second quarter of 2004 to the first half of 2005.

Compared to Intel’s Pentium 4 processors, the Athlon 64s are generally faster for games and scientific calculations and slower for 3D processing and video. Of course this judgment is based on several series of tests and may vary with applications. The purpose of this test isn’t to compare AMD’s processor to Intel’s, as we’ve already published several articles on this subject.

Several Athlon 64 versions
Several characteristics distinguish the current Athlon 64. The first is its use of the 757 or 939 Socket. This figure corresponds to the number of processor pins. Of course the additional 185 pins of the Athlon 64 Socket 939 aren’t just there for looks, but rather to control a second 64 bit DDR memory channel. If the Socket 754 controls memory on a single 64 bit channel, with 3.2 GB and some PC3200s, the Socket 939 controls memory on two 64 bit channels and with a cumulated bandwidth of 6.4 GB/s. In practice the performance gain varies from 0 to 7%.


The second characteristic is, of course, frequency or processor speed expressed in the number of cycles per second. It varies from 1800 Mhz to 2600 MHz depending on the model.

Cache size is also important. Initially all AMD processors were equipped with 1024 kb of cache L2 in addition to 128 Ko of cache L1. AMD, however, rapidly changed this configuration, and later they were equipped with 512 KB of cache L2, but with an increased frequency. Only high end products are now equipped with 1024 KB of cache L2. The performance gain varies from 0 to 5M (depending on the application) compared to a 512 KB processor with an equivalent frequency.

The third difference is the fabrication process of 130 or 90 nm. With an equivalent number of transistors, the smaller fabrication process makes it possible to reduce the die size and, in theory, electrical consumption. We verify this later in this article.

AMD currently offers no less than 12 processors with different combinations of these three characteristics:

- 2800+ S754 : 1800 MHz, 512 KB of L2, 1 channel DDR, 130nm
- 3000+ S754 : 2000 MHz, 512 KB of L2, 1 channel DDR, 130nm
- 3200+ S754 : 2200 MHz, 512 KB of L2, 1 channel DDR, 130nm
- 3400+ S754 : 2400 MHz, 512 KB of L2, 1 channel DDR, 130nm
- 3700+ S754 : 2400 MHz, 1024 KB of L2, 1 channel DDR, 130nm
- 3000+ S939 : 1800 MHz, 512 KB of L2, 2 channels DDR, 90nm
- 3200+ S939 : 2000 MHz, 512 KB of L2, 2 channels DDR, 90nm
- 3500+ S939 : 2200 MHz, 512 KB of L2, 2 channels DDR, 90nm
- 3500+ S939 : 2200 MHz, 512 KB of L2, 2 channels DDR, 130nm
- 3800+ S939 : 2400 MHz, 512 KB of L2, 2 channels DDR, 130nm
- 4000+ S939 : 2400 MHz, 1024 KB of L2, 2 channels DDR, 130nm
- FX-55 S939 : 2600 MHz, 1024 KB of L2, 2 channels DDR, 130nm

This list only includes 9 P-Ratings. A 3000+ whether a Socket 754 or a Socket 939 will be clocked at 2.0 or 1.8 Ghz. In the past, the P-Rating was even more complicated. It was possible to find a 3400+ clocked at 2.4 GHz with 512 KB of cache and another 2.2 GHz with 1024 KB of cache. Precedent tests showed that the former processor was, in fact, the most efficient. Cache size doesn’t compensate for the difference in frequency.


Page 2
Athlon 64 90nm vs Athlon 64 130nm

Athlon 64 90nm vs Athlon 64 130nm
In early September, AMD stealthily released the first Athlon 64 based on the « Winchester » core and with a 90 nm fabrication process. Like the precedent 130 nm « Newcastle », these processors are equipped with 512 KB of cache L2. Thanks to the reduced fabrication process, die size was reduced from 150 to 83 mm², making it possible to produce 75% more processors out of the same wafer.

This production cost reduction is significant even if the R&D and new equipments need to be amortized. Also, processors based on a new fabrication process will initially have lower performances (plus the number of malfunctioning products), but in the long run, reducing costs in the fabrication process is always favorable.

Officially, a Newcastle clocked at 2.2 Ghz, needs 1.5 V and consumes up to 89 watts, whereas a Winchester needs 1.4V and consumes a maximum of 67 Watts.


Officially, there are three Athlon 64s based on the Socket 939 and with a 90nm fabrication process:

- 3000+ : ADA3000DIK4BI
- 3200+ : ADA3200DIK4BI
- 3500+ : ADA3500DIK4BI

The 3500+ also exists in a 130 nm version, called the « ADA3500DEP4AW ». In practice, this is how a CPU-Z recognizes an Athlon 64 3500+ 90nm:


What are the differences in practice? We start with three measurements:

First, was the electrical consumption of the whole configuration (motherboard, 2x512 Mb of memory, 9600 Pro, 1 Raptor). Measurements were made directly from the electrical plug. Because of the efficiency of the power supply, real electrical consumption is 70 to 80% less than the result. We also noted the temperature with an internal processor probe via the motherboard after 30 min of Prime95, and finally, that given by an external probe placed on the side of the socket. Here are the results:


It’s obvious that the configuration based on the AMD Athlon 64 3500+ with a 90 nm fabrication process consumes 21 Watts less than the same 130 nm configuration. For temperature, we measured two different contradictory values. Internal temperature is slightly superior while it’s slightly inferior with our external probe.

In fact, the explanation is relatively simple. Knowing that the Athlon 64 has a higher electrical consumption the energy dissipation is made on a smaller surface because of the smaller die. The die temperature is stable, but the processor emanates less heat.

What about Overclocking? It’s difficult to judge from only one processor, but we conducted a test with a Heatpipe air cooling system (provided by AMD with the Athlon 64 4000+ / FX-55) and an electrical input 10% superior to the initial 1.4V (1.54V). In practice, we reached a stable 2.55 GHz. However, with the same fan and electrical tension (1.65V) of +10% with the 130 nm version we reached 2600 MHz. With our processor, the advantage goes to the 130 nm version, even if we hope this is only a temporary situation.

The last part is a face-off between the Athlon 64 90 nm and 130nm. Are performances equivalent between the two die? Here are the results obtained with our tests, expressed in seconds, except for the last three, which are in frames per second.


Results are quite surprising as performances vary depending on the version. Most of the time, the 90 nm is fastest with performance differences of 0.1% to 1.9%. There is, however, one exception. During video encoding to DiVX format via VirtualDubMod, the 3500+ 90nm was, in the end, clearly slower than the previous version. We are positive about the accuracy of these figures, and conducted the test several times to be sure. We don’t have an explanation for these results.

The test
To measure performances we used the following platforms:

- Socket A : ASUSTeK A7N8X-E Deluxe (nForce2 400 Ultra)
- Socket 754 : MSI K8N Neo (nForce3 250)
- Socket 939 : MSI K8N Neo2 (nForce3 250 Ultra)

Except for the motherboard, the other components are identical :

- 2x512 Mb DDR-400 Corsair in 2-2-2-8
- NVIDIA GeForce 6800 GT AGP
- Western Digital WD800BB
- Western Digital Raptor WD740GD
- Windows XP SP1 French

The results of the following processors are include in this test :

- 2800+ S754 : 1800 MHz, 512 Kb of L2, 1 channel DDR, 130nm
- 3000+ S754 : 2000 MHz, 512 Kb of L2, 1 channel DDR, 130nm
- 3200+ S754 : 2200 MHz, 512 Kb of L2, 1 channel DDR, 130nm
- 3400+ S754 : 2400 MHz, 512 Kb of L2, 1 channel DDR, 130nm
- 3700+ S754 : 2400 MHz, 1024 Kb of L2, 1 channel DDR, 130nm
- 3000+ S939 : 1800 MHz, 512 Kb of L2, 2 channels DDR, 90nm
- 3200+ S939 : 2000 MHz, 512 Kb of L2, 2 channels DDR, 90nm
- 3500+ S939 : 2200 MHz, 512 Kb of L2, 2 channels DDR, 90nm
- 3500+ S939 : 2200 MHz, 512 Kb of L2, 2 channels DDR, 130nm
- 3800+ S939 : 2400 MHz, 512 Kb of L2, 2 channels DDR, 130nm
- 4000+ S939 : 2400 MHz, 1024 Kb of L2, 2 channels DDR, 130nm
- FX-55 S939 : 2600 MHz, 1024 Kb of L2, 2 channels DDR, 130nm

For comparison, we also added Athlon XP 2800+, 3200+, Sempron 2800+ and 3100+ results. We remind you that last processor is actually based on the Athlon 64 but without 64 bit instructions and with only 256 KB of cache L2. For the Socket 754 it was clocked at 1.8 GHz.


Page 3
3d Studio Max 7, Maya 6

3d Studio Max 7
The tests start with 3d Studio Max 7 the computer-generated picture software. The first of two scenes uses a classic Raytracing-type result based on the « Architecture » scene of specAPC.


This test doesn’t require a lot of bandwidth or a large cache L2 with the Athlon 64. So performances are close for all Athlon 64 processors clocked at the same frequency. AMD’s P-rating is in trouble here as the A64 3400+ is close to the 4000+, and the A64 3200+ S939 is clearly slower than the A64 3200+ S754 ...

Developed by Studio PC, the second scene mainly uses radiosity, an effect that has a more realistic lighting effect and is slower. 85% of this scene is based on this type of effect.


The results obtained with the second scene confirm those we of the first.

Maya 6
With Maya 6 we also used two scenes provided by Yann Dupont of 3 DVF, whom we thank for their use. The first one uses the Maya Software engine and the second, the Mental Ray engine


With Maya results are equivalent than with 3D Studio Max. Results aren’t influenced by the double channel or the improvement from 512 to 1024 KB of the cache L2. Same causes, same effects.


With the second scene results are equivalent.


Page 4
Mathematica, WinRAR

Mathematica 5
The following tests are scientific calculation programs, starting with Mathematica 5 of Wolfram Research. For this test, we used the test suite developed by Stefan Steinhaus.


Here, the double channel and cache seem to have a slight influence on results. Performance gaps aren’t large, however. The Athlon 64 3200+ Socket 754 is, for example, 6% faster than its Socket 939 counterpart.

WinRAR 3.4
This test is the compression of a 535 MB file to RAR format in "best rate" via WinRAR 3.3.


WinRAR benefits from the improvement in cache, and slightly uses the double channel. The performance gap between the A64 3200+ S754 and S939 due to the difference in frequency (10% higher for the first) is reduced to 3.1%. The S754 is still the fastest!


Page 5
TMPGEnc, DiVX

TMPGEnc Xpress 3
Video compression is one of the best areas for the Pentium 4, if the application is optimised for its architecture. In the 3.0 version, TMPGEnc is optimized especially for Netburst architecture and also includes SSE3 optimizations. The following results comes from the encoding of a 3600 picture DV video in MPEG-2 with a 4000 Kbits /s bitrate, in two paths:


With TMPGEnc, the double channel has a significant influence. So if the processors’ increase in PR due to the improvement in cache size (3400+ -> 3700+, 3800+ -> 4000+) isn’t justified in this test, the PR increase due to the double channel is. Thus the 3200+ S754 is “only” 1.5% faster than its 200 MHz smaller S939 counterpart.

DiVX 5.21 / VirtualDub
The encoding of DiVX via VirtualDubMod is our next test. We used a 1500 picture MPEG-2 file compressed in DiVX 5.11 (with B-Frame and a 1500 Kbits bitrate), with VdubMod in « Fast Recompress » mode.


Performance gains due to the cache and double channel are noticeable but small. Therefore, the P-Rating isn’t really justified for most of the processors. The Athlon 64 3200+ S754 is 7% faster than the 3200+ Socket 939.


Page 6
UT2004, Far Cry, Pacific Fighters

Unreal Tournament 2004

Next, are games tests starting with Unreal Tournament 2003. The cache influence is quite important, and the P-Rating starts to be justified. The performance gap between the 3200+ S754 and S939 is inferior to one percent.

Far Cry

Results with Far Cry are equivalent to UT 2004’s.

Pacific Fighters

Pacific fighter seems to be less dependant on cache size and bandwidth than the two previous games. So once more the P-Rating is difficult to justify.


Page 7
Conclusion

Conclusion
In order to have a overall view of performances, we normalized the results for each test, giving the best result 100%. We then accumulated the scores based on the Athlon XP 3200+ to compare cards. Here are the results:



Results show that some P-Ratings have been overestimated by AMD and aren’t representative of real performance. The Athlon 64 4000+ and 3700+ have an increased P-Rating of +300 due to the L2 increase of 512 Kb to 1024 Kb. The situation is the same for the Athlon 64 3500+ Socket 939. Its P-Rating was increased by 300 compared to the 3200+ for an additional 200 MHz. Any other similar frequency increases involve a 200 points bonus. It’s also important to notice that this processor is less efficient than the 3400+.

The Athlon 64 3000+ and 3200+ Socket 939 are problematic in that they are often less efficient than their Socket 754 counterparts. To compensate for the 200 MHz reduction in frequency, they are equipped with a double channel, which isn’t enough for most situations.

This P-Rating, however, doesn’t overshadow the quality of the Athlon 64. These processors are still an excellent choice as we already stated several times. The Pentium 4 is only faster in specific applications particularly adapted to its architecture, like video and 3D processing, while heat dissipation and electrical consumption are more significant.

Our price study showed that several processors stand out. For the Socket 754, the Athlon 64 seems to be a good choice, and the 3200+ and 3400+ remain competitive.

The situation is a little bit more problematic with the Socket 939. This format has a brighter future and is compatible with the FX-55 (while waiting for faster version). The Socket 754 is limited to the 3700+ (which doesn’t deserve its P-Rating). So under these circumstances many of you will prefer the platform with the most possibilities for upgrades, even if this is still unknown (compatibility with dual-core CPU?).

A problem arise in choosing the Socket 939 : the Athlon 3000+ and 3200+ Socket 939 aren’t always available everywhere in the world and the 3500+ is clearly more expensive that the 3400+ Socket 754, but has lesser performances in the end. If you want to be able to upgrade your computer, there will be a price to pay.

Finally, we feel that it’s unfortunate that the quality of the Athlon 64 architecture is handicapped by an inappropriate sales policy. Because of its overestimated P-Rating, prices are also overestimated.

We finish this article with a positive thought. We were pleasantly surprised with the technological side of the 90nm “Winchester” core Athlon 64. Electrical consumption was less with an equivalent frequency and overall performances increased slightly. Our only reservation is current problem with increases in frequency.


Copyright © 1997-2008 BeHardware. All rights reserved.