Intel Pentium 4 660 and EE 3.73 GHz - BeHardware
>> Processors
Written by Marc Prieur
Published on February 21, 2005
URL: http://www.behardware.com/art/lire/551/
Page 1
Introduction, Prescott v1.1Prescott v1.1 The new Pentium 4s are based, like the previous versions, on the Prescott core, which was introduced almost a year ago. This controversial processor has a 90 nm fabrication process, and even if it has twice as many transistors as the previous version isn’t faster in practice.
In fact, most of the transistors were used to compensate for performance losses due to architectural modifications such as longer pipelines, or the increase in cache latency, which were supposed to allow a rise in frequency for the NetBurst architecture. The new Pentium 4’s power consumption and heat dissipation was higher than the previous version, and still didn’t provide any significant performances improvements.
Intel is now back with an evolution of the Prescott core. Three main innovations have been introduced compared to the Pentium 4 5xxJ, which was released in November (see this test) :
- The cache L2 has been increased from 1 MB to 2 MB. - supports Enhanced Intel SpeedStep Technology (EIST) - and also Enhanced Memory 64 Technology (EM64T)
 The first innovation speaks for itself and means processor size has clearly been increased. The number of transistors has jumped from 125 to 169 million. It’s also important to measure the results of this new cache in practice. Here are the performances obtained.
 Cache latency time is similar, but higher than the Northwood. The transfer rate is the bad surprise: Cache L2, whose transfer rate has already gone down with the transition to the Prescott core, has once again been reduced. In fact, it’s as fast for the first 256 KB, but is noticeably slower for the rest.
The EIST is an energy saving function similar to the one integrated to laptops. Depending on the level of use, the Pentium 4 6xx coefficient could be reduced by as much as x14, and electrical tension to1.2V. Since the J version the Pentium 4 already features a power management function to reach the same level with the Enhaced Halt State (C1E). With the C1E, it was either x14 or the maximum coefficient without intermediate levels. So at 0% of use the coefficient was x14 / 1.2v, and for a use of 5%, the coefficient and electrical tension were set to the maximum.
 With EIST, depending on the level of use the coefficient and electrical tension varies. It’s interesting to note that for the moment this function isn’t natively supported by Windows, because of a lack of adequate drivers. A utility like RM Clock can manage this function. You may notice on the bottom of the screen that the coefficient now isn’t the minimum or maximum depending on the level of use. These factors vary according to use unlike the screen above. Electrical input is actually reduced to 1.2v in stand by. With C1E, however, and we can’t explain it, the value varies between 1.237 and 1.263V.
Page 2
EM64T - the x86-64 on P4!Enhanced Memory 64 Technology EM64T is, in fact, the equivalent to the AMD64’s ISA. It’s a 64 bit extension of the x86 instruction set. So, general registers, small memory areas which temporary store memory addresses and whole numbers, are increased from 32 to 64 bits.
Processing data in 64 bits isn’t an innovation in and of itself. Since its introduction, x87 which is in charge of floating point calculations is able to work up to 80 bits internally. Furthermore, certain MMX/SSE/SSE2 instructions also permit work with 64 bit whole numbers. The use of this type of data, now generalised to all data stored in the GPR, has two advantages:
- An increased speed of whole numbers calculations. In fact, for applications which require very big whole numbers (the limit is still 4.29e9 in 32 bits, and reaches 1.84e19 in 64 bits), the process of coding the whole number to 64 bits helps the processor more easily manipulate this type of number without needing to double the number of registers and clock cycles required for calculations. This only concerns specific applications such as data encoding or scientific calculations.
- Storing data in 64 bits allows exceeding the 4 GB limit due to binary coding in 32 bits, increasing to 256 Terabytes because of the 48 bit virtual memory coding “restriction”. Intel has been able to go beyond this 4 GB limit with the Xeon to reach 64 Go, even if this code has some restrictions. Here again, it won’t be really that useful for most users.

In fact, the main interest of the EM64T, like the AMD64, is the number of registers. In x86 mode, processors have 8 x87 80 bit, 8 32 bit general and 8 SSE 128 bit registers. With the AMD64 and EM64T, the number of x87 80 bits registers is still 8, however the number of 64 bit general registers and SSE 128 bit ones is increased to 16. The higher number of available registers permits the restriction of the number of instructions to disable registers and copy them in memory. Consequently this increases performance.
Finally, the release of the EM64T and AMD64 permits a break with the sacrosanct x86 compatibility. Several executables are still compiled to be compatible with the x86 set of instructions like with the 386. They have since been improved, but these improvements aren’t necessarily used by developers during compilations. Now there is no confusion, the EM64T or AMD64 means Athlon 64 or Pentium 4.
 In practice, the performance gains of this new function are variable, all the more because they rely on compilers that aren’t necessarily perfect. In order to evaluate its performance we used the latest testzlib version of by Gilles Vollant (to whom we are grateful) which measures the Windows portage performance of Zlib library compression. Complied with Visual C++ 2005 beta it’s available in four versions :
- x86 32 bits, only C - x86 32 bits, C + Optimisation assembler - x86 64 bits, only C - x86 64 bits, C + Optimisation assembler.
Here are the results obtained expressed in MB /s under the latest Windows XP Professional x64 Edition, the v1433 (Release Candidate 2) – the final version is expected this spring:
 With the Athlon 64 the performance gain is only 1% of the compiler level for non-optimized versions via assembler routines. With the Pentium 4 660 performance is even reduced! Of course, this result isn’t normal and the origin is most likely the compiler.
The assembler version uses a routine initially developed for the Pentium Pro, and adapted to the x86-64. Results are much more logical even if in the end the executable is 1.3% faster than the 32 bit version. The 64 bit assembler version is however 7.8% faster than the 32 bit one, and this result is more than satisfactory. For the moment it’s impossible to draw any conclusion on the 64 bit results.
Performance gains that are too large generally come from poorly optimized 32 bit executables. When performances are reduced the problem at this time clearly comes from the compiler. The new Pentium 4 supports the EM64T, but it’s now hard to tell how beneficial it will be.
Page 3
CPUs, in use, tests5 new Pentium 4 Intel has released 5 new processors:
- Pentium 4 630 (3.0 GHz) – FSB800 - 224$ - Pentium 4 640 (3.2 GHz) – FSB800 - 273$ - Pentium 4 650 (3.4 GHz) – FSB800 - 401$ - Pentium 4 660 (3.6 GHz) – FSB800 - 605$ - P4 Extreme Edition 3.73 GHz – FSB1066 - 999$
4 are P4 6xxs and the last one, the Extreme (and Extremely Expensive) Edition, is based on the same core using the FSB1066 and requires a motherboard based on the i925XE. The other processors support the FSB800 and work with current Socket 775 motherboards. Now let’s compare these processors’ prices to the current lineup:
- Pentium 4 520J (2.8 GHz) – FSB800 - 163$ - Pentium 4 530J (3.0 GHz) – FSB800 - 178$ - Pentium 4 540J (3.2 GHz) – FSB800 - 218$ - Pentium 4 550J (3.4 GHz) – FSB800 - 278$ - Pentium 4 560J (3.6 GHz) – FSB800 - 417$ - Pentium 4 570J (3.8 GHz) – FSB800 - 637$ - P4 Extreme Edition 3.46 GHz – FSB1066 - 999$
Intel has decided to follow a very simple pricing policy. For the price of a P4 6xx you can also get a P4 5xx, which is clocked 200 MHz higher. Frequency isn’t the only factor that affects performance. There is also the EIST, EM64T, and a slower but bigger cache L2 for the P4 6xx. The P4 5xx should have the EM64T in a couple of months. The names of these processors will have a 1 on the end instead of 0.
In addition, as the EIST can’t reach a coefficient below x14, and as the C1E is at this value in idle, it’s not of much interest for the 630, or even for the 640 which uses x15 and x16 coefficients. The Extreme Edition doesn’t feature the EIST as it doesn’t work in 14x266.
Nothing distinguishes the P4 5xx from the P4 6xx, except for a 1.4V power supply instead of 1.3625V for the P4 5xx.

 Power consumption and temperature What about the processors’ power consumption? To have a better idea we measured the overall power consumption of our test configuration in stand-by with desktop Windows, with and without the C1E function, in use with 2 Prime 95. The temperature measured after 15 minutes of use with Prime 95 is also included. It was obtained with the new Intel fan box Socket 775.

Good news! Despite a greater number of transistors, the new Pentium 4’s heat dissipation and power consumption is lower. This reduction is great all the more because Intel already succeeded in reducing a couple of watts and degrees with the Pentium 4 « J ». The C1E mode reduced power consumption by 30 watts in stand by, a noticeable difference. Despite this function, however, the P4 EE 3.46 GHz power consumption is lower in stand by. This processor has a 130 nm fabrication process.
Overclocking Here are the results obtained for overclocking, with two Prime95s launched in parallel during 15 minutes and stablized. We remind you that we reached 4.2 GHz with our Pentium 4 570J 3.8 GHz, with an initial electrical tension of 1.3625V, and 4.3 GHz with 1.4V. A further increase in electrical tension didn’t allow us to reach higher frequencies.
We were surprised to see the P4 660 at a disappointing 3.6 GHz. With an initial 1.4 V electrical input, we reached 3.95 GHz, and then 4.05 GHz at 1.45V. We were unable to stabilize the CPU at 4.1 GHz with an electric tension of 1.5V. Based on the same core, the Extreme Edition 3.73 GHz provided much better results. We reached 4.25 GHz at 1.4V and 4.3 GHz at 1.4V.
It is difficult to draw a conclusion as we reached two different results with two processors based on the same core.
The test Tests were made on the following platform:
- Socket 775 : ASUSTeK P5AD2-E Premium - Socket 939 : MSI K8N Neo2 (NVIDIA nForce3 250 Ultra)
The rest of the configuration includes :
- 2x512 MB DDR-2 533 Corsair in 3-3-3-8 Socket 775 - 2x512 MB DDR-400 Corsair in 2-2-2-8 Socket 939 - NVIDIA GeForce 6800 GT PCI-E Socket 775 (ForceWare 66.81) - NVIDIA GeForce 6800 GT AGP Socket 939 (ForceWare 66.81) - Western Digital WD800BB - Western Digital Raptor WD740GD - Windows XP SP1 French
Page 4
3d Studio Max 7, Maya 63d Studio Max 7 We start our test series with 3d Studio Max 7, the computer-generated picture software. The first of two scenes uses a classic Raytracing-type result based on the « Architecture » scene of specAPC.

Whether clocked at 3 GHz or 3.6 GHz, the cache contribution isn’t significant enough to be measured. It’s impossible to have a reliable figure with these frequencies as there can be a 1 second margin of error for this benchmark. The P4 6xx doesn’t improve performance and for a similar price has inferior performances to the P4 5xx. Despite being clocked at 3.73 GHz, the new Pentium 4 EE doesn’t have better performances than the previous Northwood, which is much more comfortable with this type of application.
Developed by Studio PC, the second scene mainly uses radiosity, an effect that has a more realistic lighting effect and is slower. 85% of this scene is based on this type of effect.
 This time there is a performance difference between the P4 6xx and the P4 5xx, which is quite small indeed at +1.2% at 3 GHz, and +0.5% at 3.6 GHz. The new Pentium 4 Extreme Edition is 5.7% faster than the previous version.
Maya 6 With Maya 6 we also used two scenes provided by Yann Dupont of 3 DVF, whom we thank for their use. The first one uses the Maya Software engine and the second, the Mental Ray.
 There is a slight improvement with this bench: +0.7% at 3 GHz and +0.5% at 3.6 GHz. Here, the Northwood architecture used by the Pentium 4 EE 3.46 GHz provides the best results. It is 7.1% faster than the new and very expensive Extreme Edition.
 This time it’s the opposite. The new EE improves performances by 10%. The 6xx performances are comparable to the 5xx. For an equivalent price the 6xx is once again less efficient.
Page 5
Mathematica, WinRARMathematica 5 The following tests are scientific calculation programs, starting with Mathematica 5 of Wolfram Research. Here, we used the test suite developed by Stefan Steinhaus.
 Mathematica reports a very small performance gain of +0.3% for the P4 6xx over the P4 5xx whether at 3 or 3.6 GHz. The Extreme Edition is 1.5% slower than the previous version even if it is clocked 7.8% higher.
WinRAR 3.4 This test is the compression of a 535 MB file to RAR format in "best rate" via WinRAR 3.3.
The test with WinRAR is the first to really show the interest of the new Pentium 4. Indeed, at 3 GHz the performance gain is 3.6% and increases to 4.9% at 3.6 GHz. The 660 is overall just ahead of the 570 (with a similar price), and the 630 is still behind the 540. The new P4 EE is 5.7% more efficient than the previous version.
Page 6
TMPGEnc, DiVXTMPGEnc Xpress 3 Video compression is one of the best areas for the Pentium 4 if the application is optimised for its architecture. With the 3.0 version, TMPGEnc is optimized especially for Netburst architecture and also includes SSE3 optimizations. The following results come from the encoding of a 3600 picture DV video in MPEG-2 with a 4000 Kbits /s bitrate, in two paths:
TMPGEnc doesn’t really use the new cache L2: performances between the P4 5xx and 6xx are even lower at 3 GHz and 0.9% faster at 3.6 GHz. SSE3 and frequency, however, helps the P4 EE 3.73 GHz, which is in the end 19% faster than the 3.46 GHz ! It’s important to put this result in perspective however, because the 3.46 GHz processor’s performances in this test weren’t “extreme” compared to the other standard Pentium 4s.
DiVX 5.21 / VirtualDub The encoding of DiVX via VirtualDubMod is our next test. We used a 1500 picture MPEG-2 file compressed in DiVX 5.11 (with B-Frame and a 1500 Kbit bitrate), with VdubMod in « Fast Recompress » mode.
 +1.6% et +1.9% are the “outstanding” performance gains provided by the new P4 at equivalent frequencies. This result isn’t extraordinary, and the new P4s aren’t more efficient than the previous versions for an equivalent price. The P4 EE 3.73 GHz is, however, 10% faster than the previous version.
Page 7
UT2004, Far Cry, Pacific FightersUnreal Tournament 2004
 UT2004 seems to like the additional cache. The P4 630 is 3.3% faster than the 530 and the 660 is 3.6% faster than the 560. The 540 is still faster than the 630, and the 570 provide almost similar results as the 660.
Far Cry  With Far Cry the performance gain is 1.7% at 3 GHz and 2.1% at 3.6 GHz. Nothing really exciting… The new EE is 1.6% slower than the previous version.
Pacific Fighters IL-2 Pacific Fighters really likes the new cache. Performances are improved by 3.7 and 4% at 3 and 3.6 GHz. The P4 5xx are still however more efficient at equivalent prices. The EE’s performance gain of 6.3% is also notable.
Page 8
ConclusionConclusion We were expecting a strong response from Intel with the recent rise of AMD. Apparently it won’t come from single core processors. Of course, the Pentium 4 6xx lineup brings its share of innovations as heat dissipation and power consumption is lower, and they benefit from EM64T and SpeedStep technology. At equivalent frequencies they also provide slightly better performances. Gains vary between +0 and 5%.
It is, however, important to keep in mind that Intel for this new processor line sells you a 6xx approximately for the same price of a 5xx, which is clocked 200 MHz higher. So the price gap is +25.8% at 3 GHz, +25.2% at 3.2 GHz, +44.2% at 3.4 GHz and +45.1% at 3.4 GHz! The 6xx 3.8 GHz version still isn’t available unlike the 5xx. The Pentium 4 570 is still today the most efficient standard processor. So we ask is this price difference justified?
 Not really. Unfortunately, this is the law in the CPU market, whether it’s AMD or Intel. Intel sells us the additional 200 MHz (+5.9%) of the 560 for 50% more compared to the 550. It is the same for the 570 compared to a 560. The situation is identical at AMD with the additional 512 KB of the 4000+ compared to the 3800+.
We feel that, even if this price policy is common, it’s not right. Also, the SpeedStep is limited and doesn’t go as far as the AMD Cool’n’Quiet (doesn’t go below 2.8 GHz.). The EM64T is as interesting as the AMD64 and doesn’t deserve the additional cost as long as we won’t be able to measure the real user performance gains. In addition, in a couple of months the P4 5xx (the 5x1) will also feature the EM64T.
The Pentium 4 6xx processors aren’t so bad after all- they do have better performances than the Pentium 4 5xx – we just hope that Intel will quickly change their pricing policy. Of course they now feature several functions, which used to give advantages to AMD’s CPU, but the performance /price ratio is seriously reduced.
The Pentium 4 Extreme Edition 3.73 GHz provides better performances than the previous version except with three applications. At times there were great performance improvements especially with video processing. Is this enough, though? Not really, because in addition, this processor needs a FSB1066 platform, which like the EE is very expensive. In the end it is just a P4 6xx which works at 14*266.
It is a lot of money, but it seems that this is the current leitmotiv for all these new processors. What a pity…
Copyright © 1997-2009 BeHardware. All rights reserved.
|