Intel Core-i7 3960X, X79 Express and LGA 2011 - BeHardware
>> Processors

Written by Marc Prieur

Published on November 21, 2011


Page 1

A cut down Sandy Bridge-E

The first Nehalem architecture Intel Core i7s were introduced on a high-end platform, LGA 1366, in November 2008, before being rolled out for the mid-range in 2009 on LGA 1156.

With Sandy Bridge, Intel has done things the other way around and LGA 1155 became available ten months ago with the higher-end roll out only coming on LGA 2011 now. Better late than never?
Core i7 LGA 2011: A cut down Sandy Bridge-E
The Core i7 LGA 2011s use a new chip, codename Sandy Bridge-E. Designed for the Xeons, this chip has no fewer than eight cores and 20 MB of L3 cache on a 2.27 billion transistor die. The die has a large surface area of 435mm˛, which is a little more than double that of a Sandy Bridge with its IGP (not included here).

In comparison to the Sandy Bridge LGA 1155, the Sandy Bridge-E LGA 2011 spec is as follows:

- 8 cores instead of 4
- 20 MB of L3 cache instead of 8 MB
- 40 PCI-Express lanes instead of 16
- 4 DDR3 channels instead of 2

While the prospect of 8 cores and 20 MB of L3 may be mouthwatering, you can wipe away your saliva as no Core i7 will actually be sold with this configuration. At best, we’ll have 6 cores and 15 MB of L3, with the 8 core / 20 MB versions reserved for the forthcoming Xeon E5s, which will be more expensive (up to $2000). Intel is, then, recycling the partially defective SNB-E dies, which will be numerous given its size, as Core i7 LGA 2011s.

The other major difference is of course the more extensive PCI-Express support. Here 40 lanes are available to the processors themselves and they can be used as follows:

- 2x16 and 1x8
- 1x16 and 3x8
- 1x16, 2x8 and 2x4

With two graphics cards there isn’t that much of a gap with the LGA 1155 platform, with the difference between 2x16 and 2x8 being quite low in practice. For 3-way and 4-way multi-GPU systems however, LGA 2011, as LGA 1366 and AM3+ with the 990FX chipset, has an advantage even if the existence of LGA 1155 motherboards with an NVIDIA or LucidLogix PCI-Express switch means that most of the platform's inherent limitations can be compensated.

Note that although the LGA 2011 processors and motherboards have been designed for PCI Express 3.0, for the time being, while awaiting the necessary certification, Intel is only indicating PCI Express 2.0 compatibility.

Finally, there’s now quad-channel DDR3 support, which doubles the theoretical bandwidth available on LGA 1155 and increases it 33% in comparison to LGA 1366. Officially this memory is DDR3-1600 / PC3-12800 type, but ratios allowing for DDR3-1866, 2133 and 2400 are available.

Page 2
X79: cut down Patsburg, LGA 2011

X79 Express: A cut down Patsburg
On the chipset side, there’s a new chip on LGA 2011, the X79 Express. Initially designed for the Xeon platform, the X79 Express chip is actually the chip codenamed Patsburg. In its X79 Express version, it has all the same features as on the P67 Express.

Nevertheless, note that in addition to 4 SATA 3 Gb/s and 2 SATA 6 Gb/s Patsburg also has an additional storage controller which doesn’t work on X79. Depending on the different versions of the chip, more or less of the SCU (Storage Controller Unit) features are enabled:
- A: additional 4 SATA 6 Gb/s on the SCU
- B: SAS 6 Gb/s support on these 4 ports
- D: additional 8 SATA/SAS 6 Gb/s on the SCU, additional PCI-E Gen 3 link with the CPU
- T: RAID5 support on the 8 SCU ports (RAID 0/1/10 on the other versions)
Unfortunately, while initially the X79 Express chipset was going to come with configuration B, at the end of the day it doesn’t have an SCU. We don’t know what prevented the integration of the SCU. Is it a deliberate marketing choice or simply a bug, with Intel having sacrificed the SCU so as to make sure the 2011 launch date was respected?

The fact that there’s an unused space marked ‘SATA 6—9’ on the DX79SI motherboard (still a pre-version) confirms this modification made a few months ago.
A Socket LGA 2011 (not cut down)
In addition to the processor and the chipset, the LGA 2011 platform also introduces a change to the socket itself! As its name indicates, it now has 2011 contact points! This is 47% more than on LGA 1366, a result of support for a fourth memory channel and the inclusion of the PCI-Express controller on the processor.

LGA 2011, LGA 1366 and LGA 1155: it’s big!

The CPU fixture system is also different, with two notches on either side of the processor. The other major development is how the cooler is fixed on, with the fixture system screwed straight onto a metallic base that surrounds the socket, which means the plate at the back of the motherboard doesn’t have to be changed (which can be a painful experience).

Page 3
The Core i7 LGA 2011 range, Intel Watercooling

The Core i7 LGA 2011 range

At first Intel is launching two LGA 2011 processors:

- Core i7-3960X: 3.3 GHz (Turbo 3.9 GHz), 15 MB of L3, $990
- Core i7-3930K: 3.2 GHz (Turbo 3.8 GHz), 12 MB of L3, $555

Both these CPUs have six cores with Hyperthreading and a TDP of 130 Watts.

A Core i7-3820 will follow in the first quarter of 2012. Clocked at 3.6 GHz with a Turbo at 3.9 GHz, it will ‘only’ have 10 MB of L3 and four cores with Hyperthreading. It will only be partially unlocked, like the Core i5/i7 LGA 1155 ‘non-K’ versions and no pricing information has yet been released.

For this test, Intel supplied us with a Core i7-3960X, though this version isn’t necessarily the best value of the lot. CPU-Z didn’t manage to read the vCore on the Intel DX79SI motherboard - in spite of what the screenshot shows, the processor requires 0.8V at idle and 1.25V in load.

There’s more extensive use of Turbo Boost 2.0 than on the LGA 1155 processors. Firstly, ‘Burst' mode, which allows you to exceed the TDP limit for a short period of time, has been increased from 1 to 10 seconds. During this time, the processor can consume up to 156 Watts.

Beyond this limit, it will have to settle for 130 Watts but more of an increase in frequency is allowed than on LGA 1155:
- +300 MHz for 5 to 6 cores used
- +400 MHz for 3 to 4 cores used
- +600 MHz for 1 to 2 cores used
For comparison, on a Core i7 LGA1155 we’re at +100, +200, +300 and +400 MHz with 4, 3, 2 and 1 cores used.

Note that Turbo Boost mode isn’t necessarily proof of increased energy efficiency for the processor alone. If we compare performance and energy consumption at the ATX12V, which powers the CPU power supply exclusively, in Fritz Chess Benchmark we obtained the following with Turbo Boost on varying numbers of threads:
- 1 thread: performance +18.2%, energy consumption +33.3% (+12 Watts)
- 2 threads: performance +18.2%, energy consumption +39.5% (+18 Watts)
- 4 threads: performance +12.3%, energy consumption +24.6% (+16.8 Watts)
- 12 threads: performance +9.6%, energy consumption +17.5% (+18 Watts)
No box cooler but watercooling: the Intel RTS2011LC
A major difference with the LGA 2011 offer is that Intel has chosen not to include the cooler in the bundle with these CPUs. Two cooling solutions are offered separately:

- Intel Air Cooling, a standard cooler costing under $20
- Intel Water Cooling, an all-in-one water cooling solution at $85-100

The Intel Air Cooling solution is comparable to a standard box cooler and doesn’t really allow any overclocking or quiet use of the processor in load. Intel Water Cooling (RTS2011LC) uses a solution elaborated by Asetek, as used by Antec and Corsair in their cooling systems.

We were able to run this system through its paces rapidly and compare it to a Noctua NH-U12P SE2 with an LGA 2011 kit. So as to put them on an equal footing, we used two fans in a push-pull configuration (NF-P12s) in both cases and the results obtained were similar, namely 51°C up on room temperature in load in Prime95 during overclocking at 4.6 GHz.

Given the size of the radiator, it was to be expected that the system didn't give higher performance than a well-made air cooler. Note however that this out-of-casing test didn’t allow us to factor in the advantage given by watercooling systems in terms of evacuation of heat given off by the CPU outside of the casing as the radiator is fixed to the casing at the back of the 120 mm fan.

We were however unpleasantly surprised by the noise levels of this system. The Intel fan motor is audible even at low speed, but what's worse is that the pump makes a rather annoying noise, especially in vertical position, even after running for a long time and after we'd tapped on it to evacuate any bubbles in the system. In a well-ventilated system, the noise is covered by the fan noise, but if you're looking for silence when the CPU is in idle you'll be disappointed as you'll hear a noise reminiscent (though not as loud) of a hard drive during accesses.

Page 4
Intel DX79SI, G.Skill RipjawsZ, Test protocol

Intel DX79SI
For this test we used an Intel DX79SI motherboard built around the Intel X79 Express chipset. There are 8 DDR3 DIMMs with up to 64 GB of memory and three PCI-Express x16 slots running at x16/x8/x8 with three graphics cards. Note that the switches used for the last two slots at x16/x0 or x8/x8 are PCI Express 3.0 compatible Pericom PI3PCIE3s.

Another PCI-Express lane from LGA 2011 is used for a Renesas D720200AF1 USB 3.0 controller. The six PCI-Express lanes on the X79 Express chipset are used as follows:

- 2 lanes for 2 PCI-E x1 ports
- 2 lanes for 2 Gigabit (Intel 82579L and 82574L) network chips
- 1 lane for a second USB 3 controller (2 internal ports)
- 1 lane for an IEEE 1394 controller

Audio is handled by the Realtek ALC892 codec and in terms of storage we’re limited to the X79 Express spec, namely two SATA 6G ports and four SATA 3G ports.
G.Skill RipjawsZ
Four channel memory support should lead to the announcement of numerous quad channel memory kits from memory manufacturers.

For this test G.Skill sent us a 16 GB RipjawZ kit running at DDR3-1600 with timings of 9-9-9-24 at 1.5V.
The test
For this test, we used the protocol elaborated for the AMD FX test. Check out this page for more on this!

Except where we specify otherwise, the tests were carried out on the following platforms:

- Intel DP55KG (LGA1156)
- Intel DP67BG (LGA1155)
- Intel DX58SO (LGA1366)
- Intel DX79SI (LGA2011)
- ASUS M5A99X EVO (AM3/AM3+)
- 2x4 GB DDR3-1066 7-7-7 (Q6600)
- 2x4 GB DDR3-1333 7-7-7 (Q9650)
- 2x4 GB DDR3-1600 9-9-9
- 3x4 GB DDR3-1600 9-9-9 (LGA 1366)
- 4x4 GB DDR3-1600 9-9-9 (LGA 2011)
- GeForce GTX 580 + GeForce 280.26
- SSD Intel X25-M 160 GB + SSD Intel 320 120 GB
- Corsair AX650 Gold power supply

Note that having 12 or 16 GB of memory didn’t improve performance in our tests, the point simply being to use fully all the DDR3 channels available on the platforms.

Page 5
Cache & Memory, impact of quad-channel

Cache performance
Before looking at performance in applications, we wanted to check cache performance in Aida64, with all the CPUs running at 3.2 GHz in this test:

As expected the Sandy Bridge-E L1 and L2 caches gave the same performance as Sandy Bridge. In comparison to the previous architectures, we noted much improved read speeds but slightly higher latency on the L2. The L3 cache is bigger but also slower than that on Sandy Bridge, both in terms of speed and latency. In comparison to Gulftown speeds are however up, but to the detriment of latency.
Memory performance
Next we looked at memory controller performance. Still at 3.2 GHz, we tested various Intel architectures with DDR3-1600, altering the number of memory channels used. The results were obtained using Aida64 (single thread bandwidth, latency) and RMMT (multithread bandwidth).

Note, on LGA 1366, namely the Bloomfields (Core i7 45nm) and Gulftowns (Core i7 32nm), the memory controller doesn’t run at the CPU clock but at a minimal clock linked to the memory clock (3.2 GHz on Bloomfield and 2.4 GHz on Gulftown).

This difference in memory controller clock explains why performance is higher on Bloomfield than Gulftown. Moving from dual to triple channel results in a fairly small gain in bandwidth but makes for higher latency.

On Sandy Bridge-E, performance with three and four channels jumps forward significantly as bandwidth increases a good deal in both cases: +47.6% and +50% for multithreaded reads and writes when you go from two to three channels (an almost perfect scaling) and +82.8% and +82.7% from two to four channels! We got reads of 42.5 GB/s no less!

In comparison to Sandy Bridge, the results obtained were strangely down here at the same number of channels, whether in terms of speeds or latency. Latency is the same in dual and quad channel modes but is much higher in triple channel mode.
2, 3 and 4 channels in practice
Memory bandwidth is all well and good, but does it make much of a difference in practice? To find out for sure we tested the Core i7-3960X with 2, 3 and 4 channels at DDR3-1600 CL9 in our applications protocol. Here we gave an index of 100 to the results obtained with dual channel:

In the applications tests, the biggest gains when moving from two to four channels came in Lightroom and 7-zip, with respective gains of 13.7% and 9.6%. The other gains were very limited. The same goes for games as although Rise Of Flight and Anno 1404 showed gains of 3.9% and 3.6%, with the other titles the gains were between 0 and 2%.

Triple channel mode gives contrasting results: some games and applications make good use of the additional bandwidth but some games suffer slightly from the higher latency. The impact is nevertheless slight with a penalty of no more than 0.6% at worst.

Achieving quad channel support shouldn't be an end in itself and a higher DDR3 dual channel mode can give better performance. DDR3-2133 9-11-10 on two channels is faster than DDR3-1600 9-9-9 on four, except in Lightroom:

Of course, at DDR3-2133 on four channels you’d still get better speeds but given that we only had two DDR3-2133 bars at our disposal, we couldn’t check to see if the memory controller supported such speeds on four channels.

Page 6
Energy consumption and efficiency

Energy consumption and efficiency
In our previous articles on processors, we measured energy consumption in load in Prime95. This stress test has the merit of pushing the various architectures to the limit in a pretty equitable manner, but we weren’t able to use it to compare energy consumption and performance as the Prime95 benchmark consumes less and isn’t as balanced between processors.

We therefore decided to look for another application that would give us a level of performance and energy consumption representative of what we obtained on the other applications in our test protocol. In the end we opted for Fritz Chess Benchmark once again. In addition this application has the advantage of allowing us to fix the number of threads to be used easily.

The energy consumption readings therefore shouldn't be taken as absolute maximum values but rather as typical of a heavy load - applications specialised in processor stress such as Prime95 can consume up to 20% more. All energy economy features, including those on motherboards such as the ASUS EPU, were turned on for this test, as long as they didn't have a negative impact on performance:

[ 220V wall socket ]  [ ATX12V ]

Having got rid of the X58 Express chipset, the LGA 2011 platform is much more economical at idle than LGA 1366, without however performing at the same level as LGA 1155. In low load (1 thread) we’re down on LGA 1366 but we’re not far off. The same goes for maximum load (12 threads). Note that in load, energy consumption at the socket increases by 5 Watts when you go from 2x4 GB to 4x4 GB, which has an impact on the LGA 1366 and 2011 platforms which are equipped with three or four bars compared to two bars for the others.

The reading at the ATX12V allows us to isolate processor consumption. Unfortunately however, the figures are not entirely comparable as in certain cases some of the CPU consumption comes from the standard ATX 24 pin socket. To get a totally accurate comparison however, we can compare processors using the same motherboard. Note that the Core i7-3960X is the Intel CPU that draws most at the ATX12V in load.

We then looked at the energy efficiences of the different processors. To get a representation of this you have to divide the performance levels obtained in Fritz Chess Benchmark by CPU energy consumption. The only problem is however that it’s impossible to get an exact reading of CPU consumption: the readings at the ATX12V aren’t 100% comparable from one platform to another and the reading at the socket doesn’t allow us to isolate CPU consumption entirely.

We therefore decided to use two methods to isolate processor consumption:

- Energy consumption at the ATX12V
- 90% of the difference in energy consumption between load and idle at the socket

We took this at 90% so as to exclude power supply yield. Note that while the first reading favours processors that draw a small proportion of power from the standard ATX socket, the second favours those with high energy consumption at idle. Unfortunately no method is perfect.

[ 220V wall socket ]  [ ATX12V ]

The Core i7 LGA 2011 is far from attaining the energy efficiency of the Core i7 LGA 1155. In fact it’s comparable to the Core i7-990X at maximum load, but down on it in light load. This is mainly due to the fact that Turbo Boost consumes more energy on the 3960X. In spite of these slightly disappointing results, the LGA 2011 is more efficient than any AMD CPU.

Page 7

Overclocking by the bus!
We started by overclocking the processor, which differs from the LGA 1155 platform somewhat. In effect, overclocking by the bus is back!

At first, DMICLK speed is however locked at around 105 MHz as on LGA 1155. This clock is used as a basis for the other buses but Intel enables the application of a multiplier for the clock used by the processor. Two multipliers at x1.25 and x1.67 are available.

In practice we were only able to get the x1.25 mode to work, with a maximum bus clock of 131.25 MHz (105x1.25). The x1.67 mode wouldn't work, even at 150.3 MHz (90x1.67).

Is it worth having this option all the same? Of course, it will enable overclocking of the forthcoming Core i7-3820, but the 3930K and 3960X are unlocked and the gain is virtually nil as shown by these differences obtained at 4.5 GHz between a 45*100 setting and another at 36*125:

- 3ds Studio Max 2011: 0.23%
- Visual Studio 2011: 0.89%
- 7-zip: 0.82%
- Lightroom: 0%
- Fritz Chess Benchmark: 0.19%

In fact the gain comes more from the memory setting, with a bus at 100 MHz using DDR3-1600 and a bus at 125 MHz using DDR3-1666. In effect, the following basic ratios are available:

- 100 MHz: DDR3-800, 1066, 1333, 1600, 1866, 2133, 2400
- 125 MHz: DDR3-1000, 1333, 1666, 2000, 2333, 2666, 3000

As you can see, there’s not really any point in overclocking by the bus when it comes to practical performance and any advantage given only shows up in extreme benchmarks.
CPU overclocking
What about overclocking the processor itself? In Prime95, our CPU already consumes 136.8 Watts at the ATX12V, with the internal processor reading showing as 129 Watts in HWMonitor.

With Turbo, the processor then clocked at around 3.6 GHz. To start with we tried to work out what the minimal voltage possible was for a clock of 3.6 GHz in Prime95. We managed to drop down to 1.1V in the bios, without being able to tell you what the actual voltage was as the applications weren‘t able to read the vCore on the DX79SI. Energy consumption was down 36W at the ATX12V in comparison with the default configuration.

Next we started overclocking the CPU in 200 MHz jumps and we managed 3.8 GHz at 1.15V, 4 GHz at 1.25V, 4.2 GHz at 1.3V and 4.4 GHz at 1.35V. At 4.4 GHz energy consumption at the ATX12V increased to 201.6 Watts, or 47.6% higher than the initial configuration.

We then changed to jumps of 100 MHz, stabilised by voltage increases of 0.05V and stopped at 4.6 GHz and 1.45V. Here we took a reading of 256.8 Watts at the ATX12V or 87.7% more than the initial configuration. We couldn’t stablise the CPU at 4.7 GHz and 1.5V and energy consumption then reached 290 Watts.

While overclocking to this level was a decent performance for our Core i7-3960X, it wasn’t anything exceptional.

Page 8
3D rendering: Mental Ray and V-Ray

3d Studio Max 2011 - Mental Ray

We now move on to the practical tests, firstly with a 3D rendering in 3d Studio Max 2011 using the Mental Ray rendering engine on an Evermotion scene. We carried the rendering out at 600*375 so as not to extend the length of the test too much.

The Core i7-3960X has a 16.1% advantage over the Core i7-990X in this first test. It was 41.2% faster than the Core i7-2600K.
3d Studio Max 2011 - V-Ray 2.0

Still in 3d Studio Max 2011, we changed the engine for the more popular third party engine, V-Ray 2. We used another version of the same scene prepared by Evermotion for this engine, still with a 600*375 rendering. Rendering times are a good deal faster but of course we’re not carrying out a comparison of the engines themselves or the quality of the final files.

While the 3960X has just a 14.3% gain over the 990X here, it is 41.3% faster than the 2600K because of the fact that it has 6 cores.

Page 9
Compilation: Visual Studio and MinGW/GCC

Visual Studio 2010 SP1

We compiled the source code of the 3D Ogre engine in Visual Studio 2010 SP1.

In Visual Studio the 3960X is 11.8% faster than the 990X, and 41.9% faster than the 2600K.
MinGW / GCC 4.5.2

The same source code was compiled in MinGW / GCC 4.5.2.

This time there's not so much of a difference with a gain of 9.5% over the 990X and 34.8% over the 2600K.

Page 10
Compression: 7-zip and WinRAR

7-zip 9.2

7-zip has been added to our test protocol. In contrast to WinRAR, this application is highly multithreaded if its highest performance algorithm, LZMA2, is used. We measured the time required to compress a large volume of files.

The large L3 cache combined with quad-channel memory allows the 3960X to open up a record lead of 62.4% over the 2600K!
The gain on the 990X was however only 7.4%.
WinRAR 4.01

The same files were compressed in WinRAR using the most demanding RAR algorithm ("Best").

Unfortunately WinRAR doesn’t really use more than two cores and the 3960X therefore only has a 12.7% lead over the 2600K.
The 990X is however further back, with the gain for the LGA 2011 platform rising to 22.7%.

Page 11
Encoding: x264 and MainConcept H.264

StaxRip - x264 build 2085

For video encoding we retained the popular x264, here in build 2085. We used the StaxRip interface to transcode a 1080p file taken from the Avatar Blu-ray using two passes in fast mode with a bitrate of 10 Mbits /s. We’ve posted the times for both passes, the first being less multithreaded than the second and only really exploiting three or four cores.

[ Total ]  [ 1st pass ]  [ 2nd pass ]

With an 8.3% gain over the Core i7-990X, the advantage for the i7-3960X remains limited. This rises to 22.8% over the 2600K, a gain of 38.5% on the second pass combined with just 0.8% on the first, which is of course less multithreaded.
MainConcept Reference 2.2 H264 Pro

We then moved on to another H.264 codec from MainConcept. We used the MainConcept Reference H.264 interface to carry out the same type of transcoding as in x264. Note that the first pass is more multithreaded here and we have only given the overall score.

The 3960X does just 5% better than the i7-990X but is 27.1% faster than the 2600K.

Page 12
Photo processing: Lightroom and Bibble

Adobe Lightroom 3.4

We have now introduced photo processing by lot to our protocol. We started by exporting a lot of 96 RAW photos from a 5D Mark II as JPEGs in Lightroom, applying various effects such as colour and lens correction or noise processing.

As we saw previously, Lightroom is fully able to benefit from the surplus bandwidth on offer in LGA 2011. The 3960X is therefore 25.6% faster than the i7-990X, and the gap to the 2600K is 35.6%.
Bibble 5.2.2

In Bibble we processed a lot of 48 RAW photos. Note that Bibble is slower than Lightroom but, as with the rendering engines, we didn’t carry out this test to compare the applications with each other - this would imply comparing the quality of results as a slower export may also be of higher quality.

In Bibble the difference between LGA 1366 and LGA 2011 is lower with an advantage of just 10.7% for the new arrival. All six cores are however put to work and photos are processed 53.4% faster than with the 2600K.

Page 13
Chess AI: Houdini and Fritz

Houdini 2.0 Pro

We finished up our tour of applications with quite a particular choice, namely artificial intelligence algorithms designed for chess. We started with Houdini Pro 2, via the Arena 3 interface. Version 1.5 dominated the top of the chess engine classifications and Version 2 seems destined to do the same. We left the engine running until the 24th move at the beginning of a game and noted the speed in kilo nodes per second.

Houdini doesn’t really do all that well with Sandy Bridge architecture and there’s a slight dip in performance (-0.7%) in comparison to the Core i7-990X, a score which is fortunately an exception. All six cores are however fully used and there’s a 44.3% gain on the 2600K.
Fritz Chess Benchmark 4.3

We then moved on to Fritz Chess Benchmarking from Chess Base. Once again the results are given in kilo nodes per second.

This chess engine doesn’t benefit greatly from the Sandy Bridge architecture either but here we did record a small gain of 2.4%. In comparison to the 2600K, performance was up by 47%.

Page 14
3D gaming: Crysis 2 and Arma II: OA

Crysis 2 v1.9

The 3D gaming part of this comparative begins with Crysis 2. We used the latest version 1.9 in DirectX 11 and measured the framerate obtained at 1920*1080 Ultra at a precise point in the game during a shoot-out.

As in almost all games, Crysis 2 only just uses four cores and seemed to be limted by the GPU to around 50 fps in our test scene. The highest end Intel processors are all much of a muchness here.
Arma II: Operation Arrowhead v1.59

In Arma II: Operation Arrowhead we measured the framerate when crossing a village in the first solo mission, still at 1920*1080 and with all options pushed to a maximum, including visibility.

In Arma II we weren’t limited by the GPU and the Core i7-3960X was 4.5% faster than the 990X. It’s still slightly slower than the Core i7-2600K, but the difference isn’t very significant.

Page 15
3D gaming: Rise of Flight and F1 2011

Rise Of Flight v1.021b

We used Rise Of Flight, a First World War fighter plane simulator, at 1920*1080 at high graphics settings. In this test we launched a customised mission with a 32 vs 32 dogfight, with the framerate measured with the back-facing view of our 31 acolytes.

With an advantage of 15.1% on the 990X, the i7-3960X does much better in this game. It even managed to take 6.7% from the 2600K.
F1 2011

We ran the brand new F1 2011 at 1920*1080 with settings pushed to a maximum. We measured the framerate at the start of the Monaco GP.

In F1 2011, the LGA 2011 platform also has the advantage, with a 7.3% gain on the 990X and the 2600K.

Page 16
3D gaming: Total War Shogun 2, Starcraft II and Anno 1404

Total War: Shogun 2

For Total War: Shogun 2 we used the huge battle of the 'DX9 CPU' test modified for DX11 at 1920*1080 and with high graphics settings.

In Shogun 2 the i7-3960X is 17.9% faster than the i7-990X and 3.6% faster than the 2600K.

Unfortunately we can’t give you a score for the AMD FXs in this game. On this processor the game crashes on start-up, a bug we’ve told AMD about and which also exists in other games also using Steam's CEG protection (Deux Ex: Human Revolution). AMD knows about the problem and is working on a correction but it isn’t yet available.
Starcraft II v1.3.6

For Starcraft II a major attack during a replay was generously donated by some of the French forum users (thanks!). This replay contained a very (very) full-on attack and we measured its framerate at a resolution of 1920*1080 with all graphics settings pushed to a max.

All the processors were brought to their knees in this test, which is in practice even more extreme than the one used in Shogun 2. The Core i7-3960X has a lead of 16.5% over the 990X here but is down very slightly on the 2600K.
Anno 1404 v1.3

Lastly in Anno 1404 we loaded a saved game with a city of 46,600 inhabitants that we partly visualize from a distance. The resolution was 1920*1080 and all graphics settings were pushed to a maximum.

Here the 3960K gives a 13.7% gain on the 990X, with performance up by 5.7% on the 2600K.

Page 17
Performance averages

Performance averages
Although individual app results are worth looking at, we also calculated a performance index based on all tests with the same weight for each test. For the first time we’ve included two averages, one that’s applied across all the tests with the exclusion of 3D games and the other specific to 3D games (with the exclusion of Shogun 2 due to the nil score on the AMD FXs).

[ Standard ]  [ By performance ]

The Core i7-3960X only accentuates the advantage already obtained with the 990X in the tests with applications, as almost all of them fully use all the CPU’s six hyperthreaded cores. There’s not really any competition for these two processors, but the gain offered by the new platform is limited to just 10.4%.

As the 990X is around 3% faster than the 980X, which came out in March 2010, this performance gain is rather disappointing given the fact that a year and a half has gone by and there's been a change in platform. We should say that in contrast to the Core i7 LGA 1155 and the Core i7 LGA 1156, the same engraving (32nm) is still used here which automatically limits any gains within the same thermal envelope.

[ Standard ]  [ By performance ]

Games don’t really put all six cores to use. Although the i7-3960X gives a gain of 9.9% on the 990X, it's only 2.9% faster than the i7-2600K and is probably pretty much on a par with the i7-2700K. Although the 990X's clock is slightly higher in Turbo mode, the gains resulting from having four channels or the size of the L3 cache would seem to be entirely neutralised by the slightly higher latency for memory accesses and the L3 as well as the lower memory bandwidth for one thread.

Page 18

Intel has driven home its advantage with the release of the Core i7-3960X, which is now the fastest desktop processor on the market. As long as an application correctly exploits the six cores at its disposal, there's no competition for this processor and it offers excellent DDR3 support with up to 64 GB on four channels, allowing it to give previously unseen throughput. Of course, it’s also exorbitantly priced at $990 but the 3930K, which is only 3-4% slower, costs $550, which is a good deal more reasonable.

Of course, not all applications are able to put all the real estate to good use. As usual, creative applications are most able to do so, while games generally struggle to use four cores. For purely gaming use then, the Core i7 LGA 2011s don’t really offer much of an advantage over the Core i7 and i5 LGA 1155s, except for the fact that the platform is better adapted to multi-GPU configurations, particularly 3-way SLI / CrossFireX. This is a pretty modest advantage overall though as you can get LGA 1155 motherboards that use NVIDIA or LucidLogix PCI-Express switches which compensate for a good proportion of the limitations inherent to the platform.

In spite of the impressive level of performance given on LGA 2011, there were also a few disappointments. The first is that the overall gain is limited (around 15%) in comparison to the hexacore Core-i7s, the Gulftowns, which were released eighteen months ago. It has to be said that as both CPUs use a 32nm manufacturing process, Intel could only bring architectural (Sandy Bridge) or platform (quad channel DDR3) advances to bear at an equivalent thermal envelope. With respect to the platform, note the integration of the northbridge on the CPU here, a simplification in comparison to LGA 1366 which has a significant impact on energy consumption at idle.

The second disappointment is linked to the question of the thermal envelope, as Intel has really only done the minimum here. In effect Sandy Bridge-E is an 8-core but only six are enabled on the Core i7 so as to keep within a TDP of 135 Watts at decent clocks, with the last two cores being reserved for the Xeon E5s which will cost more. At $990, we might be forgiven for expecting a truly Extreme version; ie. with all eight cores enabled even if the TDP were to go up to 165 Watts!

The X79 Express platform also suffers from the same sort of restraint, though it seems that this may have been linked to a last minute bug rather than yet more Intel segmentation. Although the X79 Express was initially designed to support 4 SATA 3G, 2 SATA 6G and 4 SAS/SATA 6G ports, the SAS/SATA 6G capability has been removed and the X79 Express is now on a par with an H67/P67/Z68 Express LGA 1155, namely with 4 SATA 3G and 2 SATA 6G. Of course this is enough for the vast majority of users but given the very high-end positioning of the platform, with motherboards starting at €200 and going up to €350, Intel has missed out on the opportunity to extend the advantage over LGA 1155.

As you can see, this new platform has left us wanting more. Without any real competition from AMD at this level, Intel is far from having pushed the LGA 2011 platform to its limits. It does however offer by far the highest performance solution currently available as long as your application is highly multithreaded or you do instensive multitasking. Otherwise, LGA 1155 will not be too far behind and offers a much better price / performance ratio and better energy efficiency.

There remains a final unknown with respect to LGA 2011, which is how long a lifespan the platform will have. Will it last for some time like LGA 1366 or is it simply another short-lived socket like the now infamous LGA 1156? While current LGA 1155 motherboards will accept the forthcoming Ivy Bridge 22nm quad cores, there is as yet no certainty with respect to any Ivy Bridge-E 22nms that are released and likely to be compatible with LGA 2011 motherboards. If this were to be the case, the reduction in energy consumption linked to the 22nm engraving could well allow Intel to launch its first 8-core CPU in the Core i7 range!

Copyright © 1997-2015 BeHardware. All rights reserved.