Intel Core i7 and Core i5 LGA 1155 Sandy Bridge - BeHardware
>> Processors

Written by Damien Triolet, Franck Delattre, Guillaume Louel and Marc Prieur

Published on January 3, 2011

URL: http://www.behardware.com/art/lire/815/


Page 1

The Sandy Bridge architecture



In November 2008, Intel introduced the Core i7 LGA 1366 processors, the first CPUs based on its new Nehalem architecture. After rolling them out on Socket LGA 1156 in September 2009, Nehalem was then released on 32nm dual core and six core CPUs in January and March 2010. After more than two years of loyal service, Nehalem is now being replaced by Sandy Bridge.

The Sandy Bridge architecture

Sandy Bridge is the code name given by Intel to its new architecture and the processors built around it. This new generation is a Ďtockí in Intel parlance, signifying a new architecture using a fabrication process already used previously. Sandy Bridge CPUs are therefore manufactured on the 32nm process, like the Westmeres, the 32nm versions of Nehalem. The quadcore version is based on a 216mm≤ die with 995 million transistors, against 296mm≤ and 774 millions transistors for a 45nm Lynnfield LGA1156 quadcore.


Among the innovations introduced by Sandy Bridge CPUs, the most notable are:

- A new Socket LGA 1155
- An integrated IGP sharing the L3 cache, now known as LLC (Last Level Cache)
- An improvement to the IPC and performance per watt
- A new AVX (Advanced Vector Extension) instruction set
- A new version of Turbo Boost

Beyond these aspects of the spec on which Sandy Bridge differs from its predecessor, the technological choices introduced or reintroduced with Nehalem have been retained, namely:

- The integrated DDR3 memory controller (dual channel and DDR3-1333)
- The built-in PCI-Express controller (2 x 8 PCI-E 2.0)
-HyperThreading
- A three levels cache architecture

The first Sandy Bridge models will be on the market this quarter. There will be no less than 29 models, desktop and mobile together, either dual or quad core. Only the quad core models will be available as of January 9 and weíll have to wait until February 20 to see the dual core versions. The release of these processors has been accompanied by ten new chipset models that can support them, including the P67s and H67s, code name ďCougar PointĒ.


Page 2
Improvements to the core

Improvements to the Sandy Bridge core
Sandy Bridge is mostly based on the Nehalem architecture, but brings its lot of improvements, some of which have been taken from the Pentium 4 Netburst micro-architecture. Netburst was not based on an existing architecture, this meant a multitude of new concepts previously unheard of in the x86 universe could surface. HyperThreading, which reappeared with Nehalem, is one of them, and the Sandy Bridge micro-architecture has updated various other Netburst Ďinnovationsí.
An "L0" cache
In our study of the Nehalem architecture, we saw how this processor has a mechanism optimised for loops using a buffer containing micro-operations (namely, already de-coded instructions), thus economising on resources by not decoding the same code in a loop several times when correctly predicted. Moreover we mentioned the ressemblance with the Pentium 4 trace cache principle.


Sandy Bridge goes a bit further by introducing a 1.5 KB micro-operations cache (uop cache), which receives its data from the instruction decode units. The branching unit accesses the uop cache once a new branch is decoded and checks to see if itís in the uop cache. If it is, the majority of front-end processing (retrieval of instructions at decoding) becomes useless. This results in reduced use of front end units and an overall improvement in terms of performance per watt. Note that the Sandy Bridge uop cache isnít really comparable to the Pentium 4 trace cache: in effect the Sandy Bridge still has its L1 cache for instructions and both caches work together, in contrast to the Pentium 4 on which the trace cache replaces the L1I entirely and therefore represents a more complex implementation.
Register set-up inspired by Pentium 4
For Sandy Bridge, the Intel engineers chose to use a Physical Register File (PRF) as with Pentium 4. To understand what this consists of exactly and the reasons for this choice, we need to go back to a few of the concepts inherent in the x86 processors register set-up.

x86 processors are characterised by the reduced amount of registers: 8 in 32-bit mode, and 16 in 64-bit mode (for comparison, an IA64 processor such as the Itanium has 128). These are the eax, ebx Ö familiar to those whoíve seen x86 assembly code and which constitute the register file.

Yet modern CPUs use out-of-order (OoO) execution engines, thatís to say which can process instructions in a different order to the assembly code written by the programmer or generated by a compiler. So as to facilitate the work of the OoO, CPUs have much more than 8 internal registers and therefore resort to register renaming to maintain coherence between the CPUís physical registers (internal and not visible to programmers) and architectural registers (those the programmer can see: eax, ebx Ö) to which they refer.


In practice, the processor uses a Reorder Buffer (ROB), the role of which is to restore the instruction order to the way it appears in the programme after instructions have been executed, perhaps in a different order. In P6 derived architectures (Core 2, Nehalem and Westmere), the ROB contains the results of each micro-operation underway, accompanied by a register index that allows the re-establishment of correspondance between the physical and architectural registers. These results are copied to a Retirement Register File (RRF), which corresponds to all the architectural registers after processing.

In its time, the Netburst architecture modified this schema by using a file containing the CPUís internal physical registers (Physical Register File, or PRF). The ROB then no longer contains any data from micro-operations underway, but only pointers to the PRF. The advantage is obviously that each entry in the ROB takes up less space and the ROB can then contain more entries for the same capacity (the Sandy Bridge ROB contains 168, Nehalem / Westmere 128). Thereís no longer an RRF and coherence with architectural registers takes place via the Register Alias Table (RAT), the entries for which also point towards the PRF. These references towards the PRF mean that the data copying stages of the ROB + RRF system arenít required. This gives the Sandy Bridge system quite an advantage as thereís a lot of this data.

Using a PRF on Pentium 4 was motivated by the optimisation of Netburst for instruction sets such as SSE and SSE2, which handle data in 128-bit chunks. Sandy Bridge ushers in the new AVX instruction set on which operands can go up to a max of 256 bits, justifying Intelís choice to equip its new architecture with a PRF.


Page 3
Improvements to the core, cont

One ring for all
Our analysis of Nehalem raised the problems of sharing the L3 cache between several cores, in particular:

- the difficulty of replying to the simultaneous demands of four cores. We pointed out that the L2 caches could intercept a good number of the requests from the cores, taking the pressure off the shared cache;
- with the increase in the number of cores, the increased amount of time required to maintain coherence between the LLC and intermediary caches (known as cache snooping).

On Sandy Bridge this is even more of an issue as the GPU also shares the LLC, not to mention the 8-core models planned for the second half of 2011 (the Sandy Bridge-Eps).


The solution the Intel engineers have come up with consists in implementing a ring bus type interconnect between all the elements that require access to the shared cache. The ring design is quite simple and has many advantages: shortest path, easily scalable to include more cores if required.

This type of system is sometimes used on graphics processors on which some resources are shared between many units. The advantage of the system is that it doesnít need to get any more complex with the increase in the number of elements using it. This is promising in terms of shared cache designed for future models integrating more than four cores and makes it perfectly adapted to multi-processor platforms.

The ring connects up the four main components of the processor: the cores, the GPU, LLC cache and the system agent, which comprises the un-core part of the processor (memory controller, PCI-Express controller, DMI interface and graphics engine). Given that moving the bus round one stop requires one cycle, the GPU, which isnít too sensitive to memory access latency, is placed opposite the system agent. The bus is made up of four rings, which all carry different information: data, requests, acknowledge and snoop.

In practice although Intel specifically talks about a unidirectional ring bus, this one behaves rather like a linear bi-directional bus. This is due to a) the fact that the CPU cores each have two points of access to the ring bus and b) that the ring always takes the shortest path.

In terms of physical implementation, Intel has taken advantage of its mastery of the fabrication process to draw the ring above the LLC cache. This positioning, which requires a few small optimisations in terms of the interconnections, means chip doesnít have to be increased in size.
The cache sub-system
The hierarchy of the Sandy Bridge caches has been inherited straight from Nehalem, and brings its lot of improvements. The most important, as weíve seen, is the addition of the 1.5 KB uop cache which works in tandem with the 32 KB L1 instructions cache. The associativity of the L1 instruction cache has increased to 8-way (from 4 on Nehalem). In our study on Nehalem we explained that 4-way associativity allowed the L1I to keep latency down (because 8-way implies higher latency than 4). The use of the uop cache reduces the overall latency of the cache code as a whole, which allowed Intel to increase L1I associativity without any noticeably negative impact on latency. The L1 data and L2 Sandy Bridge caches, 32 and 256 KB respectively, are identical to what we saw on Nehalem.


The L3 / LLC cache has however been completely reorganised. The constraints of having a shared cache resulted in the choice of a ring bus, which changes how the L3 communicates with the various CPU components (cores and GPU). Using a ring bus simplifies the constraints of maintaining coherence between the caches, with the L3 now divided into sub-blocks (4 x 2 MB sub-blocks for the 8 MB versions of L3). There are many advantages to this:

- as all the sub-blocks are accessible at the same time, the L3 offers higher bandwidth;
- thereís less latency in accessing these reduced size blocks than on the single Nehalem L3 block;

Note that the L3 now runs at the same clock and voltage as the cores, whereas with Nehalem it runs at the uncore clock.

All these improvements mean the Sandy Bridge L3 gives significantly higher performance than Nehalem L3. Latency readings show an average of 31 cycles, against 42 for Nehalem. Latency obviously varies depending on where the data is in relation to the CPU core which is accessing it as each stop on the ring requires one cycle.


Page 4
Advanced Vector Extension (AVX)

Advanced Vector Extension (AVX)
Sandy Bridge introduces a new vector instruction set, AVX. AVX is part of the family of SIMD (Single Instruction, Multiple Data) instruction sets and operates on floating point numbers. Operands can be increased to 256 bits, but AVX also supports the 128-bit operands from SSE instruction sets.

The challenge when a new instruction set appears, consists in getting it adopted by developers. Intel has therefore taken particular care with this new instruction set and is counting on rapid adoption across the board.

The first condition that an instruction set must fulfil to attract developers is to provide a notable gain in performance. Intel has therefore concentrated efforts on the processing speed of AVX instructions on its new processor:

- width of register file optimised for 256-bit operands, as described previously.
- the most common operations are processed with a latency of 1 cycle on 256-bit operands: multiply, add, shuffle.
- addition of a second 128-bit issue port to double supply capacity of 256-bit operands to units. Itís interesting to note that the presence of this second port accelerates the processing of certain routines that donít use AVX.

The second condition lies in the capability of the instructions on offer to facilitate a developerís work, notably in terms of data formats. SIMD instruction sets give maximum performance when used on several bits of data at the same time, namely in vector mode. On the other hand, in scalar mode, the register only contains a single piece of data, which naturally reduces the value of having the instruction set.

Unfortunately, compilers still arenít effective in terms of automatic vectorisation, consisting in grouping data in an optimal way in the registers. You still get the best results by doing this manually, a long and often complex task. So as to facilitate rewriting or recompiling code, AVX introduces numerous new instructions (known as primitives) whose role is to facilitate this work of regrouping the data: deployment of data, insertion, extraction, conditional reads and writes.


Intel has been trying hard to push AVX and the results in the first benchmarks that support the new instruction set seem very promising. Note that for AVX support you require Windows 7 Service Pack 1.


Page 5
Turbo Boost rev2

Turbo Boost rev2
Intel Turbo Boost technology is based on increasing the clock of one or more cores, depending on the demands placed on them. When not all the cores are in full use, the CPU runs under its thermal spec. Turbo Boost uses the available budget to accelerate the cores that are being used. Turbo is managed by a unit in the system agent, known as the PCU (Power Control Unit). The PCU accelerates the cores according to the overall CPU thermal envelope, which canít exceed a specified limit, known as the TDP (Thermal Design Power).


Introduced on some Core 2 Mobile Penryn models under the term ĎIDAí (Intel Dynamic Acceleration), the technique was developed mainly on Nehalem architectures and its derivatives. An improved version of Turbo Boost has been introduced on Sandy Bridge and has been upgraded for the occasion to version 2.

Firstly, itís worth noting that the built-in Sandy Bridge GPU also benefits from turbo boost. The GPU integrated on the PCB on the Arrandales also had an acceleration feature, but it was implemented less efficiently because the GPU was separated from the rest of the processor. On Sandy Bridge, the CPU and GPU share the same die, which enables finer analysis of the overall thermal envelope.

Turbo Boost 2 is innovative mainly in the way it allows the processor to exceed its TDP for a certain time (up to 25% over, or 120W on models with a TDP of 95W), thus allowing Turbo Boost more of a margin. While this is obviously a bonus in terms of acceleration, it is worth asking how such an operation is possible without taking the CPU beyond its spec, and thus risking seeing the throttling mechanism kick in. The TDP is defined so the die temperature doesnít exceed a ceiling beyond which the circuitís integrity is no longer guaranteed.

What Intel is actually doing is exploiting a physical phenomenon: thereís a period of time before the processor actually heats up. The explanation is very simple: when the CPU is sollicited, it starts to heat up, but as we said, this takes some time and the package doesnít hit its TDP right away, if it hits it at all (if the increased load lasts long enough). Thus even if the heat disippated by the processor is higher than the TDP during this heating up period, there isnít enough time for the additional heat to take the package heat over the top.

Of course, the skill here lies in managing the application of this mechanism. The longer itís applied, the more acceleration you get but also the further you take the CPU beyond its spec, increasing the likelihood of setting off the throttling mechanism. Itís difficult to determine an optimal value, because how much the CPU temperature increases obviously depends on the cooling mechanism used, a point that is external to the processor and therefore not universally quantifiable.

In practice Intel has decided that mobile versions would benefit from higher application times than desktop models: 28 seconds compared to 1 second on a desktop! This may seem paradoxical, because desktops generally have much more efficient cooling solutions than laptops.

For this reason, and although TB2 concerns both desktop and mobile platforms, itís mainly on the mobile platforms that the technique will be fully used. Nevertheless 28 seconds seems quite high in comparison with the time a CPU takes to heat up and it may well be that Intel is using the feature to reduce the announced TDP for its mobile processors artificially, particularly the quad core models.

To sum up, TB2 looks as if it represents an innovative technique, but whose effectiveness will vary a good deal depending on the platform. With a voluntarily limited application time, TB2 wonít have much effect on desktop platforms, but will really come into its own on computers using a mobile platform.


Page 6
Socket LGA 1155, P67 and H67, LGA 2011

Socket LGA 1155, P67 and H67, LGA 2011
The new Sandy Bridge CPUs are supported by the new Socket LGA 1155. In comparison with 1156, thereís one fewer point of contact and the two sockets are totally incompatible. 1156 and 1155 processors are the same size, namely 37.5x37.5mm, but the socket notches are different which means you canít mount an LGA 1155 CPU on an LGA 1156 motherboard and vice-versa.


This is bad news, especially as the LGA 1156 was only launched in September 2009! The only consolation is that the mounting holes for the CPU cooling system are the same on the two Sockets.


Socket LGA 1155 can be paired with 4 chipsets, two for the enterprise market, the Q67 and B65, and two for the general consumer market, the H67 and P67. The P67 platform is the only one that doesnít have the option to use the CPUís on-board HD Graphics, but is on the other hand the only one that gives access to settings allowing you to overclock the CPU. Apart from this, H67 is identical to P67.


Both these chipsets differ to their predecessor due to SATA 6 GB/s support, coming, as it does, ten months after integration on the AMD 890GX. Note however, of a total of six SATA ports, only two are at this speed, the four others being SATA 3 Gb/s. USB 3.0 is notable in its absence on this generation of chipset and motherboard manufacturers will have to include an external controller. Intel itself does so on its P67 motherboard, the DP67BG.

Indeed, the bandwidth between the CPU and the chipset has been doubled using a DMI at 5 GT/s or 2 GB/s in each direction. The chipsetís PCI-Expressí are also up from 2.5 to 5 GT /s, which was the major drawback with the P55.

In comparison with the H67, the Q67 also supports a good old PCI bus (on some P67/H67 motherboards there are PCI ports managed by additional chips). In comparison to the Q67, the B65 has two fewer USB 2.0 ports, taking it to twelve, and only supports one SATA 6 Gb/s port and the SATA controller no longer works in IDE or AHCI mode and therefore thereís no RAID.

Although LGA 1155 replaces LGA 1156, socket LGA 1366 remains in place for the time being. No Sandy Bridge architecture processor has been planned for it and weíll have to wait for a new high-end platform, which wonít be here before the second half of the year. Intel is reported to have two projects up its sleeve, LGA 1356 on the one hand and LGA 2011 on the other, with the second getting priority. LGA 1356 and LGA 2011 CPUs share some impressive specs according to the rumours, namely up to 8 cores and 20 MB of LLC cache and differ in terms of:

- Memory: triple channel on 1356, or quadruple channel on 2011
- PCI Express: three PCI-E 8x Gen3 on 1356, five on 2011
- QPI: 1 on 1356 (bi-Socket), 2 on 2011 (quadri-Socket)


Page 7
The Sandy Bridge Core i7s & i5s

Sandy Bridge Core i7s & i5s
Although Intel has announced no less than 29 Sandy Bridge processors, of which 14 on desktop, only the quad core models are available as of January 9. Weíll have to wait for February 20 to see the dual core models, which weíll be coming back to at that time.

After beginning production of a 32nm processor with Nehalem architecture in line with its ďTick-TockĒ strategy, Intel is now using this process for the new Sandy Bridge CPUs. In the quad core version then, we have a 216mm≤ die with 995 million transistors, against 296mm≤ and 774 million transistors for a Lynnfield LGA1156 quad core.

The quad core range is as follows:

- Core i7: 8 MB of L3 cache, with HyperThreading
- Core i5: 6 MB of L3 cache, without HyperThreading


The last letter is what allows you to distinguish between the different ranges of processors:

- K: free multiplier going up, Intel HD Graphics 3000
- S: TDP of 65 watts
- T: TDP of 45 watts
- No letter: TDP of 95 watts

6 processors with a TDP of 95 watts:

- Core i5-2300: 2.8 GHz, 3.1 GHz Turbo, $177
- Core i5-2400: 3.1 GHz, 3.4 GHz Turbo, $184
- Core i5-2500: 3.3 GHz, 3.7 GHz Turbo, $205
- Core i7-2600: 3.4 GHz, 3.8 GHz Turbo, $294
- Core i5-2500K: 3.3 GHz, 3.7 GHz Turbo, $216
- Core i7-2600K: 3.4 GHz, 3.8 GHz Turbo, $317

Hereís exactly how Turbo Boost works depending on the number of active cores:


45 and 65 watt versions are also available, mostly aimed at OEMs:

- Core i5-2500T: 2.3 GHz, 3.3 GHz Turbo
- Core i5-2400S: 2.5 GHz, 3.3 GHz Turbo
- Core i5-2500S: 2.7 GHz, 3.7 GHz Turbo
- Core i7-2600S: 2.8 GHz, 3.8 GHz Turbo

No doubt itíll take a bit of time for us all to get used to this new naming system. To recap, here are the prices for the quad core LGA1156:

- Core i5-750: 2.66 GHz, 3.20 GHz Turbo, $196
- Core i5-760: 2.80 GHz, 3.33 GHz Turbo, $205
- Core i7-860: 2.80 GHz, 3.46 GHz Turbo, $284
- Core i7-870: 2.93 GHz, 3.60 GHz Turbo, $294
- Core i7-880: 3.06 GHz, 3.73 GHz Turbo, $583

The Core i7-2600 costs the same as the Core i7-870, while its base clock is 16% higher. The Core i5-2500 is priced the same as the Core i5-760, for a clock 18% higher.


Page 8
Intel HD Graphics 2000 & 3000

An on-die IGP
At the beginning of 2010, Intel was the first to integrate a graphics core onto a desktop processor with its Clarkdale range (dual core Core i3 and Core i5 CPUs), introduced in January 2010. Intel is extending this strategy with all 2nd generation Core series CPUs (SNB) introduced this month equipped with a graphics core.

The first big change comes in the fact that, in contrast to Clarkdale where Intel had stuck its IGP in the CPU packaging (an IGP engraved at 45 nm while the CPU was engraved at 32 nm), on Sandy Bridge the CPU and IGP are joined together on the same die and both are engraved at 32 nm. Power management is thus unified for both through the Power Control Unit. The PCU monitors overall energy consumption and adapts the Turbo modes for the CPU and IGP accordingly.


Better still, using the ring bus, as described higher up, the IGP uses the processorís LLC when it judges it necessary to do so. Intel says that the graphics driver handles which data flows can use the cache and which canít: textures and rendering buffers can, but geometric data canít. In practice this would open the way to economising 50% of the bandwidth needed for 3D rendering.

To stop the LLC being completely monopolised by the IGP, Intel has put a limitation into place on the quantity of LLC that can be used by the IGP. This is fixed by default in the microcode for each model of CPU, but can be modified by Intel for some applications via the graphics drivers. In practice, Intel told us that on the versions with a big LLC cache (8 and 6 MB?), the limitation wonít kick in, leaving the CPU and IGP cores to fight it out for use of the cache. The limitation is however required for models with less cache (3 MB?).
Architecture
In terms of the architecture, the Sandy Bridge IGP is basically the same as the previous generation, with up to 12 vec4 type execution units (EUs). We imagine, even if Intel doesnít say so, that the GPU also still has four texture units and two ROP units. DirectX support has been extended from DX 10 to DX 10.1. Given the level of performance targetted, supporting the hardware innovations of DirectX 11 wasnít a priority, especially as this API allows DX 10/10.1 components to benefit from the advantages of its new software structure.

Note also that although the Intel IGPs have hardware support to process the Vertex Shaders, Intelís drivers have a software engine to process them on the CPU. This option is used on a case by case basis in the drivers according to a list pre-established by the manufacturer.


There have nevertheless been some changes in terms of the execution units that we reported at the IDF last September. The larger register file, improved handling of complex branching and native support for a larger number of instructions.


An important choice Intel has made is to increase the number of fixed units wherever possible (3D part and video). This choice contrasts strongly with the choices made by AMD, for example, on its latest GPUs, where itís tending to reduce the number of fixed units. This brings Intel a gain in terms of energy as fixed units have a better yield, but also in terms of drivers by reducing the impact of the driver on the processor. Note that thereís support for OpenGL in version 3.1, as well as OpenCL and DirectCompute 4.1.
HD 2000 and HD 3000
For Clarkdale, the previous generation, Intel used the same GPU (HD Graphics) clocked at different frequencies depending on the model (from 533 to 900 MHz). With respect to the new Sandy Bridge generation, Intel has made a different choice, going for two different GPUs, the HD 2000 and HD 3000. They differ by the number of EUs enabled on the GPU, 6 on the HD 2000 and 12 on the HD 3000.

The choices made in terms of which GPU (HD 2000 or HD 3000) is used is quite particular. On mobile itís simple: all 2nd generation mobile Core CPUs get the HD 3000. Just the maximum clock varies depending on the model:

- Core i5 2520M, 2540M, Core i7 2620M, 2720QM, 2820QM, 2920XM: 1300 MHz
- Core i5 2410M, Core i7 2635QM: 1200 MHz
- Core i3 2310M, Core i7 2629M, 2630QM, 2649M: 1100 MHz
- Core i7 2647M: 1000 MHz
- Core i7 2617M: 950 MHz
- Core i7 2537M: 900 MHz

Itís difficult to see any clear logic here, though the last three processors on the list are ULVs, which justifies their lower GPU clocks.

On desktop, the HD 3000s are only available on K models, designed for overclocking, while the HD 2000s are used for all the other models. This is a curious choice to say the least because H67 motherboards (which have the video outs required for the use of the Sandy Bridge graphics part) wontí allow you to change the coresí Turbo multipliers, thus preventing overclocking. You can however change the graphics core Turbo multiplier. Intel explains this choice by saying that it wants to keep its HD 3000 for the highest end models. This doesnít make sense to us.

Note that only on the Core i7 2600s (standard, S and K) can the IGP be clocked up to 1350 MHz. All the other desktop IGPs are clocked at 1100 MHz, except the Core i5 2500T, which is clocked at 1250 MHz! Intel has therefore been quite extreme in its segmentation.

In terms of platforms, H67 motherboards (like their mobile equivalents) support all the standard connectors including DVI, Display Port and HDMI. HDMI 1.4 is supported, providing for 3D Blu-ray. DVI is still only supported in single channel mode.


Page 9
Intel HD Graphics, energy consumption, oc, perfs

Energy consumption
One of the innovations with Sandy Bridge is being able to monitor in real time the energy consumption of the chip and its different functional blocks (cores and IGP). Although you got a reading of this on Nehalem, it was only an estimation given from tables according to the clock and voltage. Here the reading is based on sensors which the Power Control Unit uses Ė among other things Ė to manage the Turbo mode.

Here are the values we took with hwinfo32 on a i7-2600K:


The energy consumption of the GPU at idle is impressive, especially as Aero (the Windows 7 accelerated 3D graphical user interface) was enabled. In comparison, the Radeon HD 5450 consumes 14.5 watts in load and 7 watts at idle. The difference comes in mainly from the memory chips and other components on the graphics card.
Overclocking
Although you canít change the CPU core multipliers on the H67 platform, you can change the multiplier on the GPU. As with the CPU cores on P67 (or previously on Nehalem), you arenít actually changing the multiplier but only its maximum turbo clock. By default, the 2600K runs at idle on the Windows desktop at 850 MHz.

The GPU can then increase its clock within the limits authorised by the Power Control Unit. The PCU does not however ensure that the overclocked GPU part will be fully stabilised.

In FurMark, we succeeded in reaching 1.75 GHz for the GPU part at the original voltage. Furmark has the particularity of not putting any load on the CPU, leaving plenty of available headroom. At 1.75 GHz, in a game such as Far Cry 2, you can see some throttling of the GPU by the Power Control Unit, reducing the clock to 1.55 GHz in the most complex bit of the graphics scene. At this frequency, however, stability wasnít optimal and graphics applications sometimes quit unexpectedly.

hwinfo32 is the only piece of software currently able to read the clocks of Sandy Bridge graphics cores

1.65 GHz at the original voltage was perfectly stable in our tests. At this clock, there was still some throttling. Unfortunately this is a limitation that you canít get rid of because, even with a K processor, what Intel calls ďUnlocked powerĒ (capacity to change TDP limits) is not authorised on H67 motherboards. Another strange decision.
3D performance
We measured the 3D performance of the new IGPs. We compared them with the previous generation of HD Graphics as well as AMDís integrated graphics chipset, the 890GX. We also added two graphics cards, the Radeon HD 5450 (GDDR3) and Radeon HD 5670 512 MB. These cards cost around 40 and 90 euros respectively.

With the H55 platform, we simulated graphics core performance with a Core i5 661 clocked at 900, 733 (Core i5 660) and 533 (Pentium G9650) MHz respectively. This allowed us to keep the CPU clock the same across the tests.

Although we wanted to use the same method for Sandy Bridge, we tested the processors individually in the end as we couldnít downclock the GPU frequency (to simulate the 2500/2500Ks from the 2600/2600Ks). Although authorised in the BIOS, the manipulation had no effect in practice. Here are the platforms used, with Windows 7 64 bits being the OS used across all platforms:

- Intel Core i5 661, Asus P7H55M, 4 GB DDR3 1333 Crucial
- Intel Core i5 2500/2500K and i7 2600/2600K, Intel DH67GD, 4 GB DDR3 1333 Crucial
- AMD Phenom II X4 ď3.6 GHzĒ, Gigabyte 890GPA-UD3H, 4 GB DDR3 1333 Crucial
Far Cry 2
For our games tests, we used three levels of different performance:

- 1280 x 720 low
- 1280 x 720 medium
- 1680 x 1050 medium

In Far Cry 2, the graphics modes used correspond to the low and medium modes on offer in the game. These modes use DirectX 9 exclusively.


Intel was aiming to double the performance levels of its previous IGP and it does when you compare the Core i7 2600K to the Core i5 661 in Far Cry 2. The increase is the same in the three modes tested.

The difference in performance between an HD 3000 version and an HD 2000 is around 33% at 1280 x 720 and 41% at 1680 x 1050. The graphics load for the execution units (6 additional EUs on the HD 3000 on the K processors) increases with resolution.

As for the difference in clocks (1350 MHz against 1100), it varies between 9 and 15% for CPUs equipped with an HD 3000. Thereís more of a difference with processors with the HD 2000, 19% in every case.

Note that in comparison to the AMD integrated GPU, the 890GX, the Core 15 2500 (HD 2000, 1100 MHz) is systematically better. AMD has missed several opportunities to update the graphics core used by its chipsets and this is showing. Looking at the graphics card, our 40 euro entry level card is slightly behind the 2600K. The Radeon HD 5670, which is nevertheless modest in many ways, puts the other solutions in their place!
Crysis Warhead
As we did with our test of graphics solutions for under 100 euros, we kept the ďmainstreamĒ and ďgamerĒ modes for our low and medium modes.


Crysis Warhead is still a torid test for our IGPs. The gains given by the HD 3000 in comparison to the old generation HD Graphics are higher here, up to 2.7x in the lightest graphics mode.
Going from 1.1 to 1.35 GHz gives 20% more across the board, which shows, if we were in any doubt, that the limitation comes from the execution units. The rest of what we found with Far Cry 2 holds true in Crysis (positioning relative to the competition). The performance difference with the HD 5670 puts the power of these entry level graphics solutions in context.


Page 10
Intel HD Graphics, CPU vs IGP

CPU vs IGP
We also tried to see what the impact was of the choices made by Intel in the integration of its GPU. While the memory controller is shared (this was already the case previously), the last level memory cache (LLC, L3 for CPU cores) is also shared via the ring bus.

We tried to observe the respective performances of the Sandy Bridge processor and IGP by running an application that puts heavy demands on the CPU on one side and a game that puts heavy demands on the GPU on the other. We opted for Cinebench + H.A.W.X.

We carried out three tests on each platform by varying the number of threads used by Cinebench. The values correspond to the percentage of threads in comparison to the number of physical cores on the chip. 200%, represents 8 threads on a quad core CPU with HyperThreading like the 2600K. On a dual core processor with HyperThreading (the Core i5 661) it represents 4 threads. We looked at the progression of performances across three platforms:

- Core i7 2600K + Radeon HD 5450
- Core i7 2600K + IGP
- Core i5 661

Hold the mouse over the graph to see perfs as an index.

Lets take a closer look at performance levels in H.A.W.X according to the number of threads. Thereís no impact when you use a Radeon HD 5450 (the Windows scheduler correctly prioritises the game, which is in the foreground). On the Core i5 661, thereís a slight dip of 2.5% that can be put down to the shared memory controller. But what about the performances you get with the 2600Kís IGP? Note, for information, that we reproduced this behavior with a second game/application pairing (Far Cry 2 and Prime95).

We found, at least in part, one of the causes for the drop in performance by monitoring the IGPís running clock. With 8 software threads in Cinebench, using hwinfo32 we noticed that the graphics core was being throttled, its frequency dropping from 1350 MHz to 850 MHz (2D frequency) in a few seconds. Although this explains a loss in performance, it doesnít explain everything. Several hypotheses are possible and there are probably multiple factors coming into play here. Firstly, itís logical that, like with the Core i5 661, some of the impact on performance comes from the fact that the memory controller is shared. This impact may even be bigger here because of the fact that the LLC is also shared with any of the problems that may be associated with this (saturated ring bus, snooping and so on). The final point is that we donít know exactly what level the GPU clock is read at by the monitoring software we used. Itís probable that, depending on the register, itís impossible to detect any throttling below 850 MHz.

Note also that, instead of being a cause, the throttling that we observed could well have been the consequence of another problem. In effect, the reduced clock could result from insufficient graphics load (upstream limit, at the level of the LLC for example), which wouldnít force the graphics core to increase its maximum clock.

Independently of the cause, itís abnormal to see such an impact from processor load on the graphics part, just as itís abnormal to see such extreme throttling when our energy consumption readings seem to place the Intel processors quite some way off their announced TDPs (see following page). The Power Control Unit is probably playing a part in this situation.


Page 11
Intel HD Graphics, Video and Quick Sync

Video playback
Like the other solutions on the market, the Intel IGPs obviously provide hardware decoding of videos. The idea isnít necessarily to help the CPUs Ė even entry-level dual core models can decode the AVC HD flow of a Blu-ray Ė but rather to reduce the chipís energy consumption (and therefore improve the battery life of laptops).

The previous generation already supported hardware decoding for MPEG2, VC1 and AVC (H.264). Here, however, Intel has extended the number of fixed units. Where previously Motion Compensation and Deblocking were processed by execution units, the Sandy Bridges now use fixed units for these decoding stages.

The hardware modifications implemented by Intel to its Sandy Bridge architecture are not transparent and require alterations in playback software. For example, the public version of Media Player Classic HomeCinema which allows H.264 decoding by the IGP for previous generations, here detects our Sandy Bridge correctly, but the result is a mass of artefacts.

Intel however supplied us with two Sandy Bridge compatible applications, PowerDVD 10 and Total Media Theater 5 in adapted beta versions. We were able to verify that hardware acceleration was running correctly on the first in various formats, including H.264 on MKV 720p and 1080p files. Note that Intel seems to have corrected one of the faults that we noted previously, namely the temporary accumulation of flickering in some dark areas. Either the bug has been corrected or the denoising Ė activated in the Intel drivers Ė is much more efficient. Its default setting is a bit aggressive, with very slight blurring of noisy scenes as you can see in HD HQV.

The majority of options linked to video decoding depend on correct handling of interlacing, which isnít a problem with Blu-rays as here images are displayed through progressive scanning. On very high quality sources, the differences are almost imperceptible and depend more on the denoising setting chosen than anything else. Note that Intel provides automatic correction of skin tones in its drivers. Deactivated by default, the option seems quite ineffective to us. We canít see much point in it, once again, as with good quality sources the necessity of carrying out corrections is limited. We put dynamic contrast correction in the same category, very much like you see with some LCD televisions.

Among the bugs previously noted, thereís image deterioration during accelerated playback of a Blu-ray at lower than native resolution (downscaling). This seems to have been corrected in Total Media Theater 5. However, as
Home Media
has noted, video playback is still imperfect from sources at 23.976 img/s as these are read at 24 Hz and not 23 Hz, which makes for small jumps in the flow of images every forty seconds (not necessarily visible if you donít know about them).
Video encoding: Intel Quick Sync
Behind the marketing name Quick Sync, Intel is drawing developersí attention to the option of using their IGP to carry out video encoding tasks. As with CUDA on an NVIDIA GPU or Stream on an AMD GPU, you can use some of the hardware units to carry out video encoding. The Intel equivalent to Stream/CUDA in the case of video decoding and encoding works via MediaSDK. This is an API that allows developers to fall back on software when hardware acceleration isnít available. Itís up to developers to use the bits that serve them.

From a technical point of view, when the IGP acceleration is used, the Sandy Bridge decoding of the source image (in the case of transcoding) is very fast as itís carried out by the new fixed units. From the point of view of encoding, while there are some fixed blocks, most of the work (Motion Estimation and so on) is carried out on the IGPís execution units.

Intel supplied us with two applications that are compatible with the latest version of MediaSDK, at least when it comes to hardware acceleration. These were the beta version of Cyberlinkís MediaEspresso and Arcsoftís Media Converter, both transcoding applications which serve mainly to convert voluminous files (Blu-ray and so on) into a format adapted to a peripheral (mobile phone, tablet and so on). The encoding isnít very high quality in either case.

We used MediaEspresso so as to compare encoding times at the same time noting any quality variation there may be. The software doesnít give you much room for manoeuvre when configuring the advanced encoding options. Itís impossible to choose between one or several render passes or the H.264 profile used. The options are limited to a simple choice of bitrate. For our test we re-encoded a 720p (1280x720) video as a 640x480 (while keeping a 16/9 aspect ratio) at a bitrate of 3 Mbps.

In MediaEspresso itís possible to choose the level of acceleration youíre looking for. None, accelerated decoding, accelerated encoding, and accelerated decoding + encoding. We tested these four scenarios, deactivating the image quality improvement options in MediaEspresso so as not to disadvantage one or another of the solutions.

Before introducing the results, we must insist on the fact that we have obtained four files of more or less comparable but not identical size. The same goes for quality, something weíll describe below. The encoding times should therefore be considered with circumspection.


Looking at the graph, we can say that using hardware acceleration in Sandy Bridge allows you to cut encoding time in three on the Core i7-2600K. Note also that with accelerated encoding, the CPU is still put to work. For information, the processor utilisations in the three accelerated modes were 83%, 38% and 23%.

Qualitatively speaking, the results are not exactly the same. We carried out a crop on an encoded frame, from top to bottom you can see the source image, the image encoded in CPU mode, with IGP decoding, IGP encoding and IGP decoding + encoding.



Given that the original video had a bitrate that was hardly higher than that chosen, you canít say that the final image is the sort of quality youíd expect (we went from 1.1 GB on the source video at 720p to 990 MB at 480p!). Next, it seems pretty clear that MediaEspresso uses a completely different software path when you use a purely processor version and when you use an accelerated one (completely or partially). Chromatic tendencies towards red appear on the right side of the image (first example going from the middle) in the three accelerated versions. We suppose that the Intel MediaSDK is only used when thereís hardware acceleration (partial or complete) and that another path is used for the processor version.

But even if you only look at the MediaSDK images, the three images arenít strictly identical. The purely processor versions clearly look as of theyíre at another level to us. Even if it does look a little blurry, the difference is clear during playback and you get a lot fewer artefacts.

So then, although Intel is apparently giving us hardware acceleration for video encoding with Sandy Bridge, in practice the library on which itís based doesnít necessarily seem to be on a par with the best available software options, including those, which are nevertheless modest, integrated in MediaEspresso.


Page 12
Energy consumption

Energy consumption
For the test on the CPU side, we were able to look at four models:


- Core i5-2300
- Core i5-2400
- Core i5-2500K
- Core i7-2600K



All these processors have a TDP of 95w and run at a power supply voltage of 1.2V by default, against 1.1V for processors in the S range which have a TDP of 65w. In practice, our Core i5-2500 runs very well at the clock and voltage of an S version, and then offers similar thermal behaviour. We tested them on an Intel motherboard, the DP67BG:


We measured the power consumption of the configuration at the wall socket of the power supply used, with a yield of around 80%. For the test in load we used Prime95. This means that other components such as the graphics card or the hard drive are in idle when these readings are taken.

Hold the mouse over the graph to see a classification of CPUís by result.

Thanks to the 32nm process, the energy consumption of the new Core i5s and i7s is very well controlled.

Here now is the reading for power consumption at the ATX12V, using a clip-on ammeter. We havenít given readings for the LGA 1155 CPUs as these results arenít directly comparable from one platform to the other, on LGA 1366 for example the uncore part is powered from the standard ATX socket.


Intel has given itself plenty of margin with a TDP of 95 watts on these processors, and their actual energy consumption is a nice surprise in comparison to their specs. Because of HyperThreading, the Core i7 consumes a good deal more in load than the Core i5s.


Page 13
Controlled overclocking

Controlled overclocking
The lucky people who were first to see Sandy Bridge in action quickly discovered the overclocking issues with the platform. This is because of the bus, originally fixed at 100 MHz, which struggles to get beyond 108 MHz without making the machine crash. At first everyone naturally thought this limitation had been voluntarily imposed by Intel, seeing as they arenít averse to overclocking, though only on models sold expressly for this purpose (the Ďextremeí XE verions and the K series with an unlocked multiplier). Overclocking by increasing the bus clock holds across all the models and is therefore entirely out of Intelís control.

The reason for the limitation was rapidly discovered: on Sandy Bridge, the clocks of the various buses are synchronised to the processor bus clock: PCI Express, PCI, SATA, USB are all clocked in direct proportion to the 100 MHz bus clock. This contrasts with previous platforms for which these clocks were generated asynchronously, thatís to say independently of the bus.

Desynchronisation of clocks, the system up till now, brings overall performance down beause the rationale behind it requires numerous cycles of waiting for accesses to the bus. Using a single clock gives a gain in performance but also means interdependency between clocks.

Several questions result. Intel says that the platform wasnít supposed to be validated with synchronous clocks and that this is an unfortunate mistake rather than a deliberate choice. Intel assures us that forthcoming platforms (notably the high end versions on LGA 2011) will return to the asynchronous clock system. Weíll see! Voluntary or not, this overclocking limit could be expensive for Intel. Weíve seen plenty of users turn their noses up at CPUs in the past because of their lack of propensity to overclocking.

Intel has therefore made some concessions to help us swallow the medicine.


The most powerful models, (Core i7-2600 and Core i5-2500) thus exist in K versions, the multiplier for which can be freely fixed up to 57 x (or a maximum theoretical clock of 5.7 GHz) and which are priced only slightly higher than the standard versions (+ $23 for the 2600K, and + $11 for the 2500K). Note that to modify the multiplier on the K models, they must be installed on a P67 chipset and not an H67.


You can, all the same, dream a bit with the standard non-K models: choice of memory clock (DDR3 1067, 1333 but also 1600, 1867 and 2133), the GPU and setting the power threshold to regulate the turbo mode. You can increase the turbo multiplier by four notches, which takes you up an additional 500 MHz with four cores being used as you can see in the following table.


Overclocking isnít completely dead thenÖ just completely locked down by Intel, even if Intel are denying the deliberate nature of this officially. It will certainly be interesting to check out Intelís choices when it comes to the forthcoming socket 2011 platform: will it bring back asynchronous clocks as announced? And if so, will it go back on the concessions made on the non-K versions, as well as on the attractive pricing of the K versions on LGA 1155?


Page 14
Overclocking in practice

Overclocking in practice
To start with, we wanted to test overclocking by the DMICLK, on an Intel DP67BG motherboard and a Core i7-2600K, which is at a base of 100 MHz. As expected, thereís not much room for manoeuvre and we werenít able to stabilise performance at anything higher than 106 MHz. Weíre pretty much limited according to Intelís wishes in terms of the multipliers here then.


On the Intel DP67BG card at least, the base multiplier canít be increased and you have to overclock by increasing the turbo mode multipliers. To move up to 4 GHz, you simply set them all at 40, for 1, 2, 3 and 4 active cores that is. Thereís a nuance here in comparison with standard overclocking because if the CPU exceeds its TDP, it will return to its base multiplier after one second because of Turbo 2. You can set this delay at 32 seconds, but itís actually better to raise the turbo mode power threshold to, say, 120 watts from 95.

We started with the Core i7-2600K, that we were able to stabilise at 4.1 GHz in Prime95 at its base voltage of 1.2v, compared to 3.4 GHz by default and 3.5 GHz in Turbo for 4-core. At 1.3v we got up to 4.4 GHz, then 4.7 GHz at 1.4v. At this setting, the CPU energy consumption is at 155w, against 74w in its base configuration.

L'overclocking du 2600K

We managed to stabilise the 2500K at 4 GHz at 1.2v, 4.3 GHz at 1.3v and 4.6 GHz at 1.4v, enabling the Internal PLL Voltage Override function after 4.4 GHz. At 4.6 GHz and 1.4v, the CPU consumes 130w in load, against 61w by default.

L'overclocking du 2500K

Like all the non-K CPUs, the Core i5-2400 is only partially let off the leash in terms of its multiplier (limited to +4). By default the multiplier is fixed at 31, but can go up to 32, 33, 33 and 34 in turbo with 4, 3, 2 or 1 active cores. With four active cores, you can therefore clock it at 3.6 GHz at its default voltage of 1.2v, and even 3.81 GHz by increasing the DMICLK to 106 MHz. Energy consumption is, then, 75w, compared to 60w by default.

The Core i5-2300, with a base clock of 2.8 GHz, can reach 3.3 GHz with 4 cores active simply by adjusting the multiplier, and 3.49 GHz by adjusting the bus too. We kept its voltage at 1.2v and energy consumption was then 69 watts, compared to 54.5w by default.


Page 15
Lynnfield vs Sandy Bridge at 2.8 GHz

Lynnfield vs Sandy Bridge at 2.8 GHz
Before moving ahead with tests of the CPUs in their market configuration, we wanted to isolate the gains achieved by a Sandy Bridge model (Core i5/i7 quad core 32nm LGA1155) in comparison to Lynnfield (Core i5/i7 quad core 45nm LGA1156). To do so, we clocked both processors at 2.8 GHz and measured the difference in performance. Here are the results in the form of an index for better visibility.

Hold the mouse over the graph to see results with HyperThreading.

Without HyperThreading, the gains vary between 5.6 and 20.4%, for an average of 13.2%. With HyperThreading, Sandy Bridge is 5.6 to 19.3% faster, for an average of 11.3%. Combined with the increase in clock at the same price as previously mentioned, significant price / performance ratio gains can be expected.
Impact of HyperThreading
Still at 2.8 GHz, we measured the impact of HyperThreading on performance.


The average gain is 7%, with up to 22% better in MinGW and Avidemux. Only in Nuendo 4 and, above all, Arma 2 does this technology have a negative impact on performance.
Impact of Turbo
Lastly we measured the impact of Turbo Boost on one of the processors, the Core i5-2500K.


The average gain was 2.8%.
The test
For this test, we used our usual processor test protocol. Remember weíre now using a 64-bit version of Windows 7, which means that all software available in 64-bit mode is tested in this mode.

Weíve taken the opportunity of updating the software, which means 3ds max is now tested in Version 2010, Min GW and WinRAR (3.8 up to 3.9) have been updated, as have After Effects (CS3 up to CS4) and Nuendo (4.2 up to 4.3). The VirtualDub/DiVX combos and AutoMKV/x264 have been replaced by Avidemux/x264 and MainConcept Reference/H.264, while the test files of virtually all the tests have changed or been modified (higher rendering resolution for example).

In terms of games, we have decided to retain Crysis 1.2 and its ultra-heavy CPU test but to retire World In Conflict and replace it with more recent and demanding games: Arma 2, Grand Theft Auto IV and Anno 1404 join the protocol. To show up processor differences to a maximum, we set all graphics options to a max so as to load right up, at the same time as limiting resolution to 800*600 to eliminate any smoothing due to the power of the mono-GPU solution used on the test configuration.

The hardware used with the processors is as follows:

- ASUSTeK P5QC (LGA775)
- Intel DP55KG (LGA1156)
- Intel DX58SO (LGA1366)
- ASUSTeK M4A79-T (AM3)
- 2x2 GB DDR3-1333 7-7-7
- 2x2 GB DDR3-1066 7-7-7 (where 1333 impossible)
- GeForce GTX 280 + GeForce 190.62
- Raptor 74 GB + Raptor 150 GB
- Creative Audigy
- Windows 7 64 bit


Page 16
3D Studio Max, Cinema 4D

3D Studio Max 2010

We begin with the famous image rendering software, now in its x64 and 2010 version. The test scene used is from SPECapc for 3ds max 9 (space_flyby_mentalray) which employs the MentalRay rendering machine.


As of this first test, the new Sandy Bridges show their strength. Of course, the six core LGA 1366 CPUs retain the advantage, but at the same price, the i5-2500 is 34% faster than the i5-760 and is even hot on the heels of the Phenom II X6 1075T which is in a similar price bracket although it has two fewer cores. The quad core Core i7-2600K gives the best performance of all the platforms.
Cinema 4D R11

The rendering software Maxon is well known in the overclocker community through Cinebench, which allows you to compare processor performance easily. We use version R11 of C4D in 64 bit mode with the scene from Cinebench R10 rendered at a higher resolution so as to prolong rendering time.


Our first impression was confirmed in Cinema4D. The Core i7-2600 falls between the Core i7 LGA 1366 quad and six cores, while the i5-2500 is in front of the i7-880, with a gain of 35% in comparison with the i5-760, and at the same level as the Phenom II X6 1090T.


Page 17
MinGW/GCC, WinRAR

MinGW / GCC

This is an applied test with the compilation of MAME source code using GCC under the MinGW development environment. We are now using version 5.1.4 of MinGW and compile the source code of Mame 0.133.


Like in 3ds and C4D, the i7-2600 is just behind the LGA 1366 six cores. The i5-2500 is on a par with the LGA 1156 Core i7s and the Phenom II X6s, which is no mean feat.
WinRAR 3.9

Weíre using the 64-bit 3.9 version of WinRAR that introduces new multithreading optimisations to compress a group of files.


Even in this version, WinRAR barely uses more than two cores. Thanks to the reduced latency of their L3 cache, the new Sandy Bridges extend their lead and are proving to be the fastest CPUs out there at the moment.


Page 18
H.264: Avidemux, MainConcept

Avidemux + x264

Our test videos use H.264 encoding exclusively. To start with, we use Avidemux version 2.5.2, which improves performance beyond 4 threads compared to version 2.5.1, to compress a 1920x1080 HD video file via the x264 codec at intermediary quality.


Once again, the i7-2600 is just behind the LGA 1366 six cores. The i5-2500 is 27% faster than the i5-760 and comes between the Phenom II X6 1055T and 1075T.
MainConcept Reference + H.264/AVC Pro

For this second H.264 encoding we use MainConcept Reference and its H.264/AVC Pro codec on ďHighĒ, still with the same video.


The i7-2600 is once again close to the performance levels of the Westmeres, while the i5-2500 offers a gain of 29% in comparison to the i5-760. Itís at a higher level than the Phenom II X6 1090T.


Page 19
After Effects CS4, Nuendo 4.3

After Effects CS4

Weíre using a new composition using various effects so as to render 3D animation. Multiprocecessing is activated so as to make the most of the available number of cores.


Intel processors dominate the AMD CPUs in this test, with the i5-2300 already much quicker than the Phenom II X6 1090T. The other Sandy Bridges widen the gap with the i7-2600 on the same level as the six core i7-970.
Nuendo 4.3

Hereís version 4 of Nuendo, with the latest patch 4.3, all in 64-bit. A new music project using various native plugins as well as 2 HalionOne virtual instruments was exported as a wav file (thanks to Draculax).


Like in After Effects the AMD CPUs do quite badly here.


Page 20
Crysis & Arma 2

Crysis 1.2

With patch 1.2, Crysis has a very heavy CPU bench (to be found in the Bin32/Bin64 directory). The test was carried out at high settings, but at a res of 800x600 so as to limit dependence on the graphics card.


While up till now we only thought we were limited by the GPU, the Sandy Bridge CPUs allowed us to get beyond the limit to a bit less than 31 frames per second in this very heavy scene. What really lies behind this is difficult to determine however.
Arma 2

New to our test protocol, Arma 2 is configured with all settings at a max including max visibility (10 km), which brings the configurations to their knees. Resolution stays at 800x600 to avoid the graphics card impacting on performance. To gauge performance we measure the framerate during a well-defined movement after having loaded a saved game.


As of the Core i5-2400, the Sandy Bridges are up there with the fastest LGA 1366 processors. The 2500s and 2600s widen the gap.


Page 21
GTA 4 & Anno 1404

Grand Theft Auto IV

GTA IV is included in the protocol for its weight and multi-threading optimisations. Once again all the settings were pushed to a maximum, with the exception of the textures so as not to exceed available video memory, all at a res of 800x600. We use the built-in benchmark but on a scene chosen by us for more weight than the default.


In GTA IV the Sandy Bridges confirm their advantage in gaming, though their advantage is less marked than before. The rather low increase in performance with the increase in clock however leads us to think that the limitation comes from somewhere other than the CPU in what is a scene with very heavy geometry.
Anno 1404

Anno 1404 is a strategy game tested at max settings but with resolution still at 800x600. We use a saved game with a city of 46,600 inhabitants that we partly visualize from a distance.


Itís very rare to find a game that puts six cores to work but Anno 1404 is such a game, as you can see by the differences between the Phenom II X4s and X6s and the differences between the quad cores with and without HyperThreading. The Sandy Bridges give a significant gain and widen the gap with the AMD solutions which were already suffering.


Page 22
Average

Average
Although individual app results are worth looking at, especially for CPUs with a high number of cores, we calculated a performance index based on all tests with the same weight for each test. An index of 100 was given to the Intel Core 2 Q8200.

By combining a higher clock and efficiency, the new LGA 1155 Core i5s and i7s show good progress on their LGA 1156 predecessors: the i5-2300 is thus 15% faster than the i5-750, and the difference increases to 20% if you compare the i7-880 and the i7-2600. This latest version is moreover snapping at the heels of the Core i7-970 in spite of the fact that it has six cores.

For the same price, the i7-2600 is 24% faster than the i7-870, and the i5-2500 26% up on the i5-760! This improvement in the price / performance ratio from Intel makes the situation at the top end harder than ever for AMD. On average, the i5-2300 ($177) can face up to the Phenom II X6 1100T ($265), which is however priced between the i5-2500 ($205) and 2600 ($294).

Of course this average hides different variations depending on usage and while the i7-2600 outdoes the X6 1100T whatever the application, the biggest AMD CPU is in front of the i5-2500 in four applications (3ds max, Cinema 4D, MinGW and Avidemux).

Hold the mouse over the graph to view the CPUs by performance.


Page 23
Conclusion

Conclusion
With these new Sandy Bridge Core i7s and Core i5s, Intel has given us some quad cores that not only give higher performance but also consume less energy.

Of course, itís the combination of the new Sandy Bridge architecture, more efficient at the same clock than Nehalem while offering more options at higher clocks, with the 32nm engraving, that Intel is alone in mastering among x86 CPU manufacturers, that gives Sandy Bridge its edge.


Already confined to a niche market because of their prohibitive pricing, the LGA 1366 six cores, lose a good deal of their attraction with this release. Of course, when theyíre used at their full potential theyíre still the fastest CPUs, but the Core i7-2600 isnít far off and costs a good deal less.

The Phenom II X6s are also taking quite a hit with this new Intel range, so much so that the Phenom II X6 1100T, which spearheads the AMD range, is on average only on a par with the i5-2300, which is the smallest of the Sandy Bridge quad cores! It does outdo the i5-2500 however in four cases out of twelve, but is never up on the i7-2600K, at the same time as having a lower margin for overclocking and consuming more energy.

Intel has moreover taken the opportunity offered by Sandy Bridge to extend its strategy of integration of a graphics core within its CPUs. The graphics is now integrated onto the same die as the CPU and the new HD Graphics offers performance levels of up to twice the previous generation and is now, in its fastest version, on a par with a Radeon HD 5450 type graphics card. This is of course insufficient for gaming, but means you can do without a GPU in some other cases, including for an HTPC.


Intelís extreme segmentation is however regrettable and leads to some incoherencies, like the fact that HD Graphics 3000 is only included for K models, which are the most expensive, while the H67 platform doesnít even allow overclocking. However good it may be, an IGP is still an IGP and wanting to buy the ďbestĒ version at a high price doesnít make much sense.

There are two other drawbacks when it comes to these impressive new processors. Firstly, having to move over to socket LGA 1155, and therefore ending the nevertheless fairly recent LGA 1156ís life, doesnít do much for the idea of the evolution of a platform, something we hold close to our hearts. Letís hope that this represents another mistake in the Intel roadmap...

Intelís lockdown when it comes to the Sandy Bridge overclocking margins is also bad news. Sure, Intel isnít abusing this just yet and is leaving a small amount of room for manoeuvre of 500 MHz in comparison to the base clock on standard models and pricing the unblocked K models at a reasonable level, but thereís no guarantee this situation will last.

These points donít however alter the excellence of these new Core i7s and Core i5s, which though not perfect, offer, with absolutely no doubt, the best price / performance ratio of all mid and high range processors out there.
So dear Bulldozer, Sandy Bridge is well positioned to take on all comers!


Copyright © 1997-2014 BeHardware. All rights reserved.