AMD Phenom 9600 - BeHardware
>> Processors

Written by Marc Prieur

Published on November 19, 2007

URL: http://www.behardware.com/art/lire/694/


Page 1

Introduction



Launched in September 2003, the Athlon 64 was a true success. AMD had an architecture that was extremely efficient compared to its rival, Intel, whether it was in terms of brute performances or performance / power consumption ratios, an aspect which started to take on increasing importance. AMD plowed ahead with the Athlon 64 X2, or its dual core version, which arrived in May 2005.

Intel’s reaction was slow as it took 3 years for the Core 2 to arrive last summer. However, the response was worth the wait because Intel has since offered an architecture that doesn’t suffer from any real defects and produces first rate performances despite its reduced power consumption. The Santa Clara giant drove its point home even more by launching the first x86 quad core processor, the QX6700, a year ago.

One year later, AMD finally responds to Intel’s quad core with the Phenom, which is based on K10 architecture and to which we devoted an entire article. Of course, this was longer than officially planned as last year mid-2008 was the target and even six months ago it was a question of the third quarter.

A « native » quad core
With 463 million transistors, the AMD Phenom processor is comprised of 4 cores. Each is equipped with 128 KB of L1 cache (as was the case with the Athlon 64) and 512 KB of L2 cache. In addition, the cores have a third level cache with a capacity of 2 MB.

AMD underlines the « native » quad core characteristic of its processors as opposed to Intel. And there are indeed a number of characteristics which involve the ensemble of all four cores together instead of groups of two like with the Core 2 Quad. This is especially the case for L3 cache and the fact that the Phenom is composed of a single silicon die unlike Intel’s solutions.


AMD and Intel were already were more or less in the same position at the launch of the first dual core processors and actually the advantage of the « native » aspect wasn’t too evident on desktop configurations. Gains between 1 and 2 cores were very close on the two different architectures at the time.

On the other hand, the management of the defective products is quite different. Intel assembles two dies of 143mm² (65nm) or 107mm² (45nm) to produce a quad core, which means that in the case of a defective die only half of a quad core is discarded. For AMD, a K10 is 285mm² and when possible they can recycle partially defective processors into tri and dual core versions.


Page 2
The architecture, the product line, Spider

A booted architecture
In terms of performances, K10 architecture improves upon K8 in various domains in addition to the increase from 2 to 4 cores. In the processing of SSE, the IPC has been boosted via the integration of two SSE 128 bit processing units which in theory will allow it to attain speeds twice as fast as the K8 and be equivalent to a Core 2. Caches were adapted to these new capabilities and their speeds were doubled to identical frequencies.


The branching predictor unit was improved in order to be able to process indirect branches, while the DDR2 and DDR3 memory controller (on the AM2 and AM3) gained increased buffers, an improvement of prefetch and can work in ganged (1x128 bits) or unganged mode (2x64 bits, which can, for example, enable simultaneous reading and writing).

AMD also included improvements in terms of whole number division speed compared to the K8 (for example, IDIV r32 goes from 40 to 22 cycles) and memory reading instructions are now managed "out of order". Finally, there is a unit devoted to stack management.

As for power use, on the AM2+ platform a Phenom separately controls the power of each core and memory controller allowing finer energy management. In addition, the frequency of the 4 cores can be set independently however power levels remain identical and equivalent to that which is required by the core that is most in use. We were able to confirm this desynchronisation of cores in tests via adjusting their coefficient via AMD Overdrive software.

2.3 GHz to start with
To start with, AMD launches two Phenoms, the 9600 and 9500. Set at 2.3 and 2.2 GHz, respectively, they were announced at around $283 and $251 or 240 € and 215 €. As you can see, AMD is aggressive on prices and Intel simply does not offer a Quad core in the 9500’s price range (the Q6600 being at around 240 €). Of course, beyond prices there is the question of performance, something we will look at further on in this article.


While we are on the subject, it’s unfortunate that AMD hasn’t managed to go beyond 2.3 GHz. Up until a week ago, a 2.4 GHz model, the Phenom 9700, was also to be launched at the same time. At the release event for European tech journalists, AMD had even organized the possibility to benchmark this CPU in a controlled environment (impossible to install any other benchmarks other than those present, amongst other things). Unfortunately, this version didn’t prove to be too stable and AMD justifiably decided to not launch it for the moment. It goes without saying that the relatively weak increase in frequency could be problematic if not resolved rapidly especially given AMD’s intent on attaining 2.6 GHz in the first quarter of 2008.
The Spider platform
AMD took advantage of the release of the Phenom to launch not just a processor but a complete platform called the Spider. It is composed of three elements:

- An AMD Phenom processor
- An AMD 790FX chipset motherboard
- A Radeon 38x0 GPU graphic card


While Intel already tried imposing its Centrino platform concept on the desktop via ViiV, AMD is better equipped to do so given that their acquisition of ATI gives them an entire foundation for a PC desktop destined for games. At the moment, Intel is in fact absent from the GPU market while NVIDIA isn't currently offering any CPUs.

The 790FX sets itself apart with expanded PCI-Express management as it can control 4 PCI Express x16 ports, two being cabled with 16 PCI-E lanes and with 8 lanes. CrossFire X, which will allow running 3 or even 4 graphic cards at the same time (did someone say Crysis ?), will need a motherboard of this type, but it will not arrive before the end of 2008. On the other hand, the southbridge is rather standard with the number of SATA being limited to 4 and network management has to be confined to additional chips.

As for the GPU, a short while ago we published a complete test of the Radeon 38x0 if you want to brush up on the details.

In order to underline that it wasn’t just a processor that was launched but rather an entire platform, we obtained a complete « Spider » configuration that will be on sale starting Monday. Of course, for our test we only used the motherboard and processor in order to be able to integrate everything to our CPU test protocol.


Page 3
In practice

The Phenom 9600

Physically and besides its label, nothing distinguishes a Phenom from a classic Socket AM2 processor. You may recall, the Phenom and all AM2+ processors are supposed to function on all existing AM2 motherboards. This is something that hasn’t necessarily been verified in practice as we show later on in this article. In this case, the HyperTransport bus which functions at 1.8 GHz on the AM2+ is retrograded to 1 GHz and the frequency of the memory controller is also decreased as the AM2 does not allow managing the power of cores and memory controller separately.
The Gigabyte GA-MA790FX-DQ6

The Phenom tests were carried out on a motherboard based on the latest AMD chipset, the 790FX. Developed by Gigabyte, the GA-MA790FX-DQ6 has no less than 4 PCI Express x16 ports and in addition to functions inherent to the chipset, integrates two Gigabit Ethernet PCI Express Realteak ports, a controller enabling the support of two supplementary SATA, and a FireWire chip.
Power consumption
We measure power consumption of the processor’s power supply stage with the help of an ammetric clip on the ATX12V connection line which it uses exclusively. This allows us to better isolate CPU power use instead of obtaining overall consumption. The only thing we should keep in mind is that the CPU power supply stage has an efficiency of between 80 and 90%.


Note that Cool’n’Quiet could not be activated on the motherboard whether it was for the A64 or Phenom. Also, we couldn’t include the QX9650’s power consumption measurements with 1 or 2 sessions of Prime 95 because we no longer had the CPU at the time of this test.


As you can see, the Phenom 9600 consumes as much as an Athlon 64 X2 6400+ in load as well as in stand-by. Compared to an Athlon 64 X2 4400+, it’s almost double. While the QX6850 is above a Phenom in terms of power use, the QX9650 is below thanks to its 45nm while at the same time offering an entirely different level of performance.
Overclocking

Even if AMD has not set the Phenom above 2.3 GHz at this time, curiosity led us test the overclocking limits of the 9600. We were able to adjust the processor from 11.5x200 to 11.5x225 MHz or 2.58 GHz with a voltage of 1.25V. 2.64 GHz was reached with a voltage of 1.3V and these results were validated with 4 sessions of Prime95 for 15 minutes.
AM2 (in)compatibility ?

Due to a lack of time, we were only able to test AM2 compatibility with one motherboard, the M2N32-SLI Deluxe from ASUSTeK. Unfortunately, once the Phenom was installed the system did not boot and the motherboard shut down after several seconds. Installation of the latest bios beta 1402 dating from the end of October didn’t change things. Here is something to monitor and we hope that a bios update will add this much awaited retro-compatibility.
Cache speed
Tests carried out with RightMark Memory Analyser show a significant improvement of cache management with the Phenom compared to the Athlon 64 X2 65nm. The L1, L2 and L3 caches display respective latencies of 3, 9 and 20 cycles versus 3 and 22 for the L1 and L2 caches of the Athlon 64.


Bandwidth was also improved because L1 is almost four times as fast in reading and twice as fast in writing. L2 is 2.5 times faster in reading and 1.6 times faster in writing.
The tests
There was really only one real problem encountered in tests : with the Phenom it was impossible to stabilize the Spider platform with a memory controller Command Rate at 1T despite attempts with various 2x1 GB memory kits. On the motherboard’s original Gigabyte bios, adjustment caused instability while with the latest bios the Command Rate remained at 2T despite what the bios indicated! Not a very elegant solution to sidestep this problem...

Here is the rest of the test configuration:

We now move onto a comparison of these processors with other dual and quad cores in our usual test suite. Here is our test configuration:

- GeForce 8800 GTX / ForceWare 169.01
- 2 x 1024 MB DDR2-800 4-4-4
- 2 x Raptor 74 GB
- Windows XP SP2 French
- Socket 775 : ASUSTeK P5K Deluxe
- Socket AM2 : ASUSTeK M2N32-SLI Deluxe (A64 X2 6400+)
- Socket AM2+ : Gigabyte GA-MA790FX-DQ6 (A64 X2 4400+ & Phenom)

In addition to the Phenom 9600 in its standard configuration, we also tested it with the activation of only 2 and then 3 cores. With 2 cores we can better see performance improvements related to architectural modifications.


Page 4
3ds Max 9 and Maya 8

3ds Max 9 and Maya 8
For this test, we use two test scenes for Maya and 3dsmax developed by Yann Dupont at 3DVF (whom we thank) and using the MentalRay rendering engine. This choice wasn’t arbitrary since this engine is now available for both software and is most commonly used in production.

- The scene with 3dsmax is very heavy in terms of polygons and the number of objects. The objective was to test processor capacity and manipulate a heavy flow of data.

- Maya's scene is much lighter, but uses MentalRay's advanced lighting algorithms and employs the processors’ raw power in terms of mathematical calculations.


The gains in going from K8 to K10 architecture were rather variable here. With the same number of cores we obtained 14.6% better in 3ds and 6.8% gain in Maya. This is not enough for the Phenom to catch up to the Core 2 because it is at respective 90% and 87% of the Q6600’s performances in this bench.


Page 5
Mathematica 6 and WinRAR 3.71

Mathematica 6
In the domain of scientific calculation, we use the new version 6 of Mathematica from Wolfram Research and its integrated benchmark, MathematicaMark2006.


In Mathematica, K10 architecture adds a big performance gain at more than 32%. However, the gap is still large compared to the competition with Phenom 9600 only attaining 71% of the Q6600’s performances.

The tests integrated to MathematicaMark2006 are the following: Data Fitting, Digits of Pi, Discrete Fourier Transform, Egeinvalues of a Matrix, Elementary Functions, Gamma Function, Large Integer Multiplication, Matrix Exponential, Matrix Multiplication, Matrix Transpose, Numerical Integration, Polynomial Expension, Random Number Sort, Singular Value Decomposition, and Solving a Linear System.
WinRAR 3.71
Since its version 3.6, WinRAR was given multithread optimizations. We compress in RAR at the highest level a total of 588 MB of files comprised of 493 Word & Excel files (69 MB), 22 e-mail Eudora files (251 MB) and a single wav audio format file (268 MB).


With slightly more than a 13% gain, K10 considerably improves performances compared to K8 ; however, once again, this is not enough to surpass the Core2 Q6600. WinRAR does not really take advantage of 4 cores and for the first time we see the Phenom being slower than the Athlon 64 X2 6400+.


Page 6
TMPGEnc 4.0 & DiVX 6.7

TMPGEnc 4.0 XPress

The fourth version of this MPEG-2 encoder integrates a several optimizations for the Core 2 and improves performances by approximately 5%. For this test, we encode a 10 minute 16 second DV file to MPEG-2 format in 720x576 with an average bitrate of 4500 Kbits in two paths. The video preview display is activated during this test and the DV file is decoded via a Mainconcept codec, which is faster than decoding in TMPGEnc.


Improvements to SSE units bear their fruit here with no less than a 35.7% gain between K8 and K10 with the same number of cores. For this reason, here the Phenom 9600 is relatively close to the Q6600 though just slightly behind.
VirtualDub & DiVX 6.7
We now use the version 1.7.6 of VirtualDub and the version 6.7 of DiVX which has SSE4 optimizations. We encode the same video source as with TMPGEnc in Fast recompress mode and with the DiVX 6.7 codec in one path with an average bitrate of 1500 Kbits /s, highest quality encoding performance, and Experimental SSE4 full search activated in SSE4 or SSE2 mode. The video preview mode is activated during this test.


In DiVX encoding, the Phenom 9600 and Q6600 are even closer with AMD’s quad core being at 98.2% of the performances of Intel’s quad core. There is a 26.5% gain between K8 and K10.


Page 7
Nuendo 3 & After Effects CS3

Nuendo 3

Something new in our test suite is Nuendo in version 3. This is a solution devoted to audio and post-production. The test consists of exporting a relatively heavy project to an audio file (thanks to DraCuLaX for the file).


K10 architecture does about 10% better than K8 in this test while the Phenom 9600 is at 92.3% of a Q6600.
Adobe After Effects CS3
We now move on to Adobe’s After Effects in its CS3 version. Here we apply various effects and filters in video editing and then compile the movie. It’s the compiling time that is measured. In this test, we obtained the lowest gain with K10 architecture at only 3.5% better. For this reason, the Phenom is very far behind Intel processors. The quad core adds little benefit here and is even behind the A64 X2 6400+.




Page 8
Crysis, World In Conflict, Flight Sim X

Crysis, World In Conflict, Flight Sim X
The latest very popular FPS, Crysis, whose demo was just released is used for its processor test integrated to the game. In Crysis, with two active cores K10 increases performances roughly 15% compared to K8. This is good but only barely enough to surpass the A64 X2 6400+. In the end, Core 2s are significantly faster.


World In Conflict is a recent real time strategy game with a rather resource heavy integrated benchmark. The gap with Intel processors widens in World In Conflict despite a 13.3% gain due to the new architecture. The Phenom 9600 only has 79% of the performances of a Q6600.


We finish in Flight Simulator X with the framerate obtained when flying over New York after taking off from JFK in high quality. K10 does 19% better but this is not enough to surpass an A64 X2 6400+ and is even further away from Core 2s.




Page 9
Conclusion

Conclusion
With an average performance gain of 17.4% with the same number of cores, the Phenom’s K10 definitely offers greater efficiency compared to the Athlon 64 X2 and its K8 architecture. Unfortunately, as noteworthy as these gains are it’s are not enough for AMD to take the lead in terms of performances. In fact, the Phenom 9600 still quite far behind and is on average equivalent to 85.8% of a Q6600.

Intel Core architecture thus offers the luxury of being slightly more efficient at equivalent frequencies as well as having the ability to easily increase in frequency. At the same time the AMD Phenom is currently limited to 2.3 GHz. This combo means that Intel is without competition on the high end and should be able to enjoy this situation for some time. Intel’s architectural evolution combined with advanced fabrication processes –45nm versions of Core processor being both less expensive and low power – means AMD is facing a tough challenge.

The creator of the Athlon decided to try and attack this problem via two distinct approaches. The first is to feature a complete platform based on a CPU-GPU-Chipset trio and in addition they are the only one to do so. However, for the moment and while waiting for CrossFire X the Spider platform does not offer any real advantage compared to the competition. The second AMD strategy is the Phenom’s aggressive pricing. The 9600 and 9500 were $283 and $251 upon their release while Intel doesn’t offer a Quad Core for less than $266. This is good but given that the 9600 not does not match the Q6600’s performances, AMD will need to make a little more effort in this area.

In the absence of a battle in the performance domain, AMD will therefore have to compete with a better performance / price ratio despite a theoretically higher production cost. This could be risky but is the only option that will allow it to sell masses of Phenoms. The other trump card for AMD could be the tri-core which will be even less expensive and have interesting performances. This is something that we could see in these tests when they were compared to dual cores in applications which make good use of multithreading…

Of course, beyond the price there are other factors. The possibilities for evolution, for example, as we do not yet know to what extent AMD will be able to increase the Phenom’s performances. On the contrary, we know more or less where Intel is going with its Core2 45nm. For the moment, AMD is planning 2.4 and 2.6 GHz versions in the first quarter of 2008. While we are on the subject of evolution, it’s unfortunate that AM2 compatibility isn’t fully functional at this time. We are hoping this is only the lack of an adequate bios for the motherboard in question.

Disappointing for performance fanatics, the Phenom will therefore have to hope its aggressive prices have an impact. We can only hope for AMD that the arrival of new steppings in the coming months and the transition to 45nm planned for the end of 2008 will allow K10 to really take off in terms of frequency. Otherwise, the Phenom will find itself in a perilous and unexpected position.


Copyright © 1997-2009 BeHardware. All rights reserved.