Intel Core 2 Extreme QX6700 (Quad Core) - BeHardware
>> Processors

Written by Marc Prieur

Published on November 2, 2006

URL: http://www.behardware.com/art/lire/642/


Page 1

Introduction

In April 2005, Intel was the first to release a dual core processor just before AMD. These processors became more and more popular since the release of the Core 2 Duo and decreases on AMD´s price list. Now the number of cores increases to four with the release of the Core 2 Extreme QX6700. The MHz race is over and welcome to the core era…

two Core 2 Duos in one
To release a quad-core processor so quickly, Intel used the same method as for the first dual core. It isn´t a processor with four execution cores like AMD´s future K8L (available in mid-2007) but a pair of Core 2 Duos in the same package using the same Socket.


The method was already proven with the Pentium D. While the performance improvements from a single Pentium 4 to a Pentium D weren´t that great, they were close to what the Athlon 64 was to the X2. We now have a CPU based on 582 million transistors shared between two distinct dies.

Initially, only the Core 2 Extreme was concerned by the Quad. In consequence, the QX6700 will be sold for $999 like the X6800. We remind you that the latter is based on a Core 2 at 2.93 GHz whereas the QX6700 is based on two Core 2s at 2.67 GHz. Intel seems to have kept the same frequencies, because of the thermal envelope. However, in practice the QX6700 can reach higher frequencies as we will see later on.

If you want a "cheaper" Quad, you will have to choose the Core 2 Quad Q6600 that will be out next January. This time, the frequency is 2.4 GHz, like the E6600, and the price is 2.7 times the price of a E6600 or $851. Of course, knowing that there is no dual Socket 775 motherboard, we have to use the Xeon. The additional cost that this implies (motherboard, registered memory) in order to have an equivalent solution also means that the end cost would probably be a bit higher. Either way, we can’t be too happy with this pricing policy.

It is important to point out, however, that Intel currently has two problems with the quad. The first one is, of course, production costs as two Core 2 dies are very expensive to produce. It wouldn´t be interesting for Intel to sell a Quad for less than two times the price of a Duo since there is no competition from AMD in this domain. The second problem for now is that we have to say the improvements the quad brings aren´t obvious in all applications.
Four cores, what for ?
Indeed, we have to keep in mind that to use such a GPU, you have to feed each of the cores in parallel with tasks to execute. To do so, there are two solutions: either you have multithreaded applications, which mean that they execute their basic functions in parallel, or you have to use several applications that aren´t multithreaded in parallel.

In addition to server applications, most multithreaded ones are destined for workstations. This is the case of most software used to create computer generated images, editing and video and audio encoding. However, 99% of games are monothread, and when they are multithreaded, performance improvements aren´t always that obvious.

Since the release of the first dual core processors, there have been many promises about dual core optimisations for games. After a year and a half, multithreaded games are still rare and those with actual performance increases are even rarer. Of course, new games will be released in the months to come and we hope that this situation will change, mainly with Unreal Tournament 2007, or Supreme Commander.

For now, quad core is something that you should take into consideration only if you are running applications, which already heavily favour dual cores. If not, this processor isn´t for you unless you are planning on running at least three applications simultaneously using each core (which is really rare!).


Page 2
In practice

In practice
A good point for Intel is that the Core 2 Extreme QX6700 is compatible with all P965 and 975X boards already compatible with the Core 2 Duo. For this test, we chose the ASUSTeK P5W-DH. It was good in testing the Core 2 Duo and the bios 1505 required for the proper management of the multiplying coefficient with this CPU.


Physically, the QX6700 is similar to the standard Core 2 Duo and only when it’s open (as seen on the photo provided by Intel), can we really take a look at this beast. After launching Windows´ task manager or even CPU-Z, we can see that there are 4 cores, which we tested mercilessly in this article.


We can see the raw power in results with ThreadTest of Julien Houiller for a calculation of 64 bit whole numbers:
- 1 thread: 55.55 million operations /s
- 2 threads: 111.1 million operations /s
- 4 threads: 222.2 million operations /s
With 4 threads, the calculation is exactly 4 times faster. With a Pentium Extreme Edition, which is based on 2 physical cores with HyperThreading, or a total of 4 logical cores, the ratio was 2.9.

What about power consumption of the Core 2 Extreme QX6700? We compared it to the X6800 and the figure represents the entire configuration (motherboard, 2x1GB of DDR2 memory, X1950 Pro, 1 Raptor). The measurements were taken directly at the power outlet. The real power consumption of the configuration is lower as the power supply efficiency is of 70 to 80%.


As you can see, Quad significantly increases power consumption and even with 1 or 2 Prime 95s (a monothread application). Power consumption improvements are 22 and 31 watts, respectively. This isn´t dramatic thanks to the good results of the Core 2. With maximum load and in theory twice the performance, delta equals 57 watts!

What about the Overclocking of this quad core? As it is an « Extreme Edition », the coefficient is unlocked and can be pushed up. We chose this Overclocking and validated it with four Prime 95s in parallel for 15 minutes. Here are our results:
- 11x266 = 2933 MHz : stable
- 12x266 = 3200 MHz : stable at 1.400V (275 Watts)
- 13x266 = 3466 MHz : stable at 1.475V (319 Watts)
It is interesting to note that the processor could boot at a much higher frequency (14x266) but we never succeeded in stabilizing it at "reasonable" voltage. At 3.46 GHz, power consumption is already 71 watts superior to the normal.
The tests
The configurations and test software changed since our article on the Core 2 Duo. From simple upgrades to the addition of new tests and the modification of test scenes, there are various changes. Results are no longer comparable to previous ones.

Here are the configurations

- ATI Radeon X1950 Pro / Catalyst 6.9
- 2 x 1024 MB DDR2-800 4-4-4
- 2 x Raptor 74 GB
- Windows XP SP2 French
- Socket 775: ASUSTeK P5W DH (i975X)
- Socket AM2: ASUSTeK M2N32-SLI Deluxe


Page 3
3ds Max 9 and Maya 8

3ds Max 9 and Maya 8
For 3D rendering, the first modifications were upgrades and we now use 3d Studio Max 9 and Maya v8.

Also, the two test scenes for Maya and 3dsmax developed by Yann Dupont 3DVF (whom we thank) use the MentalRay rendering engine. This choice wasn’t arbitrary since this engine is now available for both software and is most commonly used in production.

- The scene with 3dsmax is very heavy in terms of polygons and the number of objects. The objective was to test processor capacity and manipulate a heavy flow of data.

- Maya´s scene is much lighter, but uses MentalRay´s advanced lighting algorithms and employs the processors’ raw power in terms of mathematical calculations.


With 3ds and Maya, rendering times are seriously improved thanks to the QX6700 with respective performance improvements of 90 and 96% thanks to the 2 additional cores.


Page 4
Mathematica 5.2 & WinRAR 3.61

Mathematica 5.2
For scientific calculations, we stuck to the Wolfram Research’s Mathematica 5.2. However, we succeeded in activating the software’s multithread optimisations and run one test where some calculations benefit: MathematicaMark2004 (integrated to the software).


This time, performance improvements are lower but still existent. Some might be surprised by the results obtained here and the Pentium EE is clearly less disadvantaged than in the past. It seems that the power required by this test is quite different than the one used in the past, and the consequences in terms of performances do not benefit AMD.

This test now uses the multithread and this gives an advantage to the EE and its 4 logic cores. This test also partly uses vectorization optimisations used in Mathematica 5.2 and they work particularly well with Netburst architecture.

MathematicaMark2004 tests are Data Fitting, Digits of Pi, Discrete Fourier Transform, Matrix Egeinvalues, Elementary Functions, Gamma Function, Large Integer Multiplication, Matrix Exponential, Matrix Multiplication, Matrix Transpose, Numerical Integration, Polynomial Expension, Random Number Sort, Singular Value Decomposition, and Solving a Linear System.
WinRAR 3.61
The 3.51 is no more and welcome to WinRAR 3.61. This new version includes multithread optimisations. Beside this point, the test procedure is the same: a total of 588 MB of 493 Word and Excel files (69 MB), 22 Eudora e-mail box (251 MB) and one audio wav format (268 MB) file is compressed using the most advanced option.


Multithread optimisations of the latest version of WinRAR do not really improve the performances of the quad core, which increase 4% compared to the E6700. In the end, the QX6700 ends up behind the X6800.


Page 5
TMPGEnc 4.0 & VDub + DiVX 6.4

TMPGEnc 4.0 XPress

We are now using the fourth version of this MPEG-2 encoder. Compared to the third version, the latest integrates a couple of optimisations for the Core 2 and improves performances by approximately 5%. For this test, we continue to encode a 10 minute 16 second DV file to MPEG-2 format in 720x576 with an average bitrate of 4500 Kbits and in two paths. The video preview display is activated during this test and the DV file is decoded via a Mainconcept codec, which is faster than the decoder in TMPGEnc.


TMPGEnc really benefits from the quad core even if the performance improvement is more comparable to a "tri-core".
VirtualDub & DiVX 6.4
We upgraded VirtualDub from 1.6.11 to the 1.6.16 version. The upgrade mainly concerns the correction of several bugs. We are now also using DiVX 6.4. Performances are also improved since we measured an 8% gain with the A64 X2, 10% with Pentium EE and 14% with Core 2. We encoded the same video as in TMPGEnc in Fast recompress mode, via DiVX 6.1 CODEC, in one path with an average bitrate of 1500 Kbits /s, b-frame and with best quality encoding performances. The video preview display is activated during this test.


With the quality selected, the performance improvement for encoding with DiVX 6.4 was only approximately 20%.


Page 6
Far Cry & Pacific Fighters

Far Cry
There are no real modifications for this test. Far Cry is still one of the heaviest FPS on the market because of the outdoors scenes in the training map, which we use.


Performances are slightly improved with Far Cry and the X6800 remains the fastest.
Pacific Fighters
We removed 20 seconds from our scene that lasted 2 minutes. This passage required fewer resources from the CPU and had an important impact on average results. The impact on the results of the Core 2 was more significant than on those of the Athlon 64 FX.


Pacific Fighters doesn´t benefit at all from the dual core nor the quad.


Page 7
Flight Simulator X

Flight Simulator X
Here is something new! If Far Cry and Pacific Fighters required a lot of processor resources compared to the average game, the release of the Core 2 Duo redefined CPU game performances. We had to include a new game to challenger modern processors.

We considered Oblivion, but despite its reputation felt that it didn´t fit our requirements. With a X1950 Pro, which provides high performances, a high level of details and CPU performances, we ended up with restrictions coming from the graphic card. This wasn’t in terms of fillrate (an easy solution to this problem would be to reduce the resolution) but rather geometrical power and, for example, changing from a Core 2 Duo E6600 to X6800 didn´t improve performances.

After taking a quick look at Supreme Commander, which unfortunately is only available in beta versions and doesn´t use the dual or quad core, we finally ended up with Flight Simulator X. The scene chosen was a flight over New York after taking off from JFK. The level of details was "average high", as higher settings were really too slow.


As you can see, the framerate is far from reaching peaks and it perfectly illustrates the expression "CPU limited". It’s just unfortunate that that game really doesn´t take advantage of the available processor power via an adequate multithreading. Didn’t we already mention that Netburst isn´t the best for games?
About tests in games
Of course, we run theses tests with low resolutions (800*600, et 1024*768 for Flight Simulator) to measure absolute CPU performances and to reduce as much as possible the influence of external limitations such as the graphic card.

Depending on the card, game, scene, or resolution, the processor will have to wait for the graphic card or the other way around.

The combinations are infinite and this is the reason why we chose to run our tests with the objective of finding which processor is the fastest and how faster it is when it is at the origin of performance restrictions. Indeed, the restriction due to a graphic card, which we might find if we used the highest resolution, wouldn´t necessarily be representative and other parameters would be affected.

Also, the figure we give you is an average framerate over a period of 1 to 2 minutes to ensure a stable measurement. This smoothes out average results and could mask gaps that would be important at a time "T" when performances would be restricted by the processor (an explosion, for example).

To know if your CPU or graphic card restricts performances in games, we recommend running a simple test. Reduce the resolution to 800*600 or even 640*480. If you notice an improvement in framerate it’s your graphic card, if not, you are CPU limited.


Page 8
1 vs 2 vs 4 Core, FSB1066 vs FSB1333

1 vs 2 vs 4 Core
Here in a nutshell are results obtained in tests with Core 2, with one core (by using a E6700 with only one core activated), two cores (E6700) and four (QX6700). The 100% correspond to performances with one single core:


Increasing from 1 to 2 cores improves performances with 3ds, Maya and TMPGEnc: 94, 97 and 85%, respectively. Increasing from two to four cores, however, is less interesting, because performance improvements are still very good with 3ds and Maya (+90 and 96%), while they are only 47.3% with TMPGEnc. Performances with Mathematica and DiVX 6.4 are increased by approximately 20% with the Quad Core, 4% with WinRAR and 1.6% with Far Cry.

Does the problem come from the processor or application?
FSB1066 vs FSB1333
Some believe that the performances of the Quad Core would be restricted by FSB and this would explain the smaller improvements. This is an interesting point considering that the two dies share the same FSB and that communication between the dies and cache management also needs to use the FSB.


To find the answer to the previous point, we simply tested the E6700 and QX6700 with 8x333 + DDR2-833 instead of 10x266 + DDR2-800 with 3ds Max and TMPGEnc. These applications really take advantage of the multithread, but do not require the highest bandwidth.


By increasing to FSB1333, we didn’t measure any performance improvements with the E6700 and we measured small ones with the QX6700 in this test. So, if there are some restrictions due to this FSB, this isn´t where the problem is. There were two possibilities. It could be a memory bandwidth problem and in this case DDR3 will be the solution. However, more likely software isn´t yet capable of fully using 4 cores, because multithreading is still limited.


Page 9
Conclusion

Conclusion
Two years ago, we all thought AMD had the best position in the dual core race. Last year, however, Intel was the fastest and released the Pentium Extreme Edition one month earlier than the Athlon 64 X2. Today, Intel succeeds in releasing a quad core a year in advance! AMD will soon release the 4x4 platform, but it will "only" feature two Opteron processors renamed the Athlon 64 FX that will fit into two distinct Sockets.

Of course, the solution chosen by Intel, two dies in the same packaging, is far from being the most elegant and we prefer by far the mono die solution that will be released by AMD in a year. In the meantime, Intel has a product on the market even if we have to keep in mind that this monster with 582 million transistors is quite expensive to produce and that the company will only succeed in making it widely available after the release of 45nm technology expected in 2008 (unless it has to answer to an aggressive price policy by AMD with the future K8L).


There is one question left…besides the technological demonstration, what is this processor really made for? If graphic and multimedia experts don’t look too hard and are pleased to have an alternative to Xeon stations for 4 cores, the majority of users will find this CPU almost pointless.

Indeed, even if you are using applications that require heavy processor resources, most software will only use two cores. Applications that require the most performance are still monothread (games) and “multithreaded” doesn´t necessarily mean capable of fully using four cores.

Intel released the quad-core one year in advance compared to its direct competitor, but we wonder today if it isn´t simply a year too early for desktops. Of course, it’s better to be early than late, but initially the Xeon line will benefit the most from this improvement whether in terms of density level or license prices, as the Socket license is more and more generalised.


Copyright © 1997-2010 BeHardware. All rights reserved.