16 cores in action: Asus Z9PE-D8 WS and Intel Xeon E5-2687W - BeHardware
>> Processors

Written by Guillaume Louel

Published on May 9, 2012

URL: http://www.behardware.com/art/lire/865/


Page 1

Introduction



During our test of Intelís new high-end platform at the end of last year, we regretted the fact that there were no eight-core models on the LGA 2011 platform.

The highest end general consumer Intel, the Core i7 3960X, only has six active cores, though it is based on an eight-core die. The reason given for this choice is to try to contain energy consumption and thus maintain the TDP at around 130 Watts. This is understandable but you do have to wonder what the eight-core version would have given us in terms of performance.

While thereís talk of the forthcoming launch of consumer eight-core processors on the Ivy Bridge-EP platform, we donít actually have to wait for these to arrive on the market to get an idea of how they will do: the SandyBridge-Es already exist in an eight-core version... as Xeon E5s! The range designed for Intel server applications was in effect renewed in March and shares the same die as our cut-down Core i7 3960X.


Indeed there are no fewer than eight eight-core processors in this range! Apart from clock segmentation, TDP segmentation also exists to cover different cooling needs. The question of TDP is particularly important when this type of processor is placed in rack servers.

If you look closely however, thereís one processor in this list that stands out somewhat from the rest. It isnít right at the top of the range and isnít even the most expensive. It also has a TDP of 150 Watts. Called the Xeon E5-2687W, itís Intelís fastest octo-core with a base clock of 3.1 GHz and a Turbo mode that can go up to 3.8 GHz. Itís a chip that wouldnít look at all bad at the top of the Core i7 hierarchy and one that weíve decided to test for you.

Because two heads are better than oneÖ

One of the particularities of this Xeon is that, like other processors in the 2600 series, it can run in a pair. We wouldnít therefore have felt right testing it on its own! This did of course require a suitable motherboard. We got our hands on the Z9PE-D8 WS, an Asus Ďserverí board designed for Ďworkstationí use and which also has something of the mass market motherboard about it!

Ö and how the ATX is a good deal too small!

So then, the dream opportunity to introduce this type of platform to you Ė extreme in all senses of the word Ė and for those who desire it, to go into the details of how platforms with two sockets (abbreviated to 2S in this report) or more function!


Page 2
SMP, Nehalem, QPI, quick recap

While for the general consumer, the introduction of Core i7 Nehalem processors meant above all the division of Intelís range into two sockets (LGA1366 launched in November 2008, then LGA1156 in September 2009), in terms of the Xeon range it represented a profound reorganisation.

For those who arenít familiar with it, Intelís Xeon range comprises processors designed for the enterprise market and for server type machines or workstations. From the manufacturing point of view, many of the Xeons share the same dies as the mass market processors (there's a final extra die designed for configurations beyond 4S with a particular socket), but some of their features can be turned on or off according to the segmentation. One of the features that has been forbidden to the mass market Intel range since the dawn of time, is running two processors at the same time on the same machine, a process often called SMP (Symmetric Multi Processing).

Before going into performance, letís go back over a few technical particularities of these platforms which play a decisive role in the way theyíre capable of exploiting the performance of several processors at the same time.

QuickPath Interconnect

One of the principal particularities of the 1366 platform launch was the introduction of a new communication bus, the QPI. The QPI is a point-to-point interconnect bus designed to provide interconnection with the rest of the machine, generally speaking the chipset. Up until now Intel was using a proprietary bus, the FSB (that required a license for its use both on the processor and the chipset side), the clock of which evolved over time.

In terms of the inspiration for the QPI (both in concept and technical implementation) we have to look at the competition. After dropping the FSB in 1999, AMD launched a new point-to-point bus in 2001, the HyperTransport which served first and foremost to link up the northbridge and southbridge chipsets, on the Nvidia nForce for example. Then with the Athlon 64 it was also used as an interconnect between the processor and the northbridge. With the simultaneous launch of the Opteron platform, AMD used this same bus to connect processors to each other in multiprocessor machines.

Thus just as the Athlon 64s only had a single active HyperTransport link (in theory) against three for the Opterons, Intel included several QPI links (four for Nehalem) in its processors on socket 1366, which can be activated according to usage. For a Core-i7 desktop processor, only one QPI link is active, linking the processor and the southbridge. When all four links are active however, original combinations can be obtained such as, for example, this quad processor platform where each chip is linked to three others as well as to the chipset:


As you can see on this diagram, each processor has its own memory Ė Nehalem of course saw the introduction of the memory controller, previously used in the northbridge, onto the same die as the processor. This raises the very real questions of how the processors share the data among themselves and how the operating system sees such a system.


Page 3
MESI, MOESI, MESIF, NUMA

MESI, MOESI, MESIF, NUMA

With the arrival of multi-core processors, the concept of multiple and shared resources has largely been accepted, both on the hardware side and by operating systems. In one of todayís processors, a single memory controller is used by multiple cores, with, between them, a cache hierarchy (to store the data thatís most frequently used) thatís sometimes unique to each core and sometimes shared so as to create a system that both allows the controller to be shared efficiently and maximises performance. With respect to the operating system, if there are multiple cores they all share the same memory.

The question of shared resources does however raise new problems: what happens when one core wants to access a piece of memory data used by another core? Which Ďversioní of the piece of data is the right one, that stored in the RAM or that stored in the cache?


Some coordination quickly becomes necessaryÖ

Mechanisms have been created to handle conflicts, most often the MESI protocol which allocates states (Modified, Exclusive, Shared, Invalid) to cache lines so as to enable a minimum of coordination between cores.

MESI

Used until Nehalem by Intel, MESI is a protocol that:
  • ensures cache coherencey and memory coherency
  • enables cores to work together

Letís take the example of a core A which needs to read a piece of data in the memory. Using MESI, it will first of all find out if any of the other cores is using this data. A request is then sent to all the other cores. If none of the other cores are using the data that has been requested, the memory controller will look for it in the memory and then send it to the cache that has requested it. However if another core B has already requested this data previously to read it, it will also have marked this cache line as Exclusive. This is when interaction between the cores kicks in as core B then goes into the Shared state to indicate that it isnít alone in using this data and it sends the data straight to core A (known as forward) which then also takes on ownership of this copy (known as instance). The system works particularly well when a maximum of two cores can access the same piece of data. If however the same piece of data is marked as Shared on several cores, all these cores reply to the request! Several identical responses then transit between the cores, using up bandwidth for nothing. If the shared state is poorly managed, it can have its limitations.

For our second example, letís say that the piece of data requested by our core A hasn't simply been read by core B but that it has read it and then modified it. In practice core B will have moved this cache line from the Exclusive to the Modified state. This state indicates that the data in the main memory is dirty or no longer valid and that the cache line is only present in the current cache in its true state with respect to the data it contains. If our core A then requests this data, core B will have to carry out a whole series of operations to ensure coherency:
  • Write the data back to the main memory so as to synchronise the changes (writeback)
  • Change the sate of the cache line to Shared
  • Send the updated copy to core A (forward)

These operations are of course costly, first of all because the memory controller is involved!

MOESI

To correct the issues previously mentioned, several developments have been made to MESI, AMD introducing MOESI. This protocol changes the situation with respect to the two points we brought up above, introducing the Owned state. If we take our first case, core B can change from Exclusive to Owned mode before sending the copy. Up until here, thereís little difference, but if however a third core wants to access this data, using MESI, cores A and B will respond simultaneously. MOESI helps avoid this: the cores marked as Shared no longer respond to requests! Only the core marked as Owned will respond, reducing traffic.

In the second case, our core B in Modified mode will, instead of carrying out multiple operations (writeback, change in mode Shared, forward) simply go into Owned mode before sending the data to core A. This saves on bandwidth, which is real progress

MESIF

As of Nehalem, Intel abandoned MESI in favour of the MESIF protocol to which a new state, Forwarding, was added. Thus in our first example, when core A requests the data, core B changes from Exclusive to Forwarding mode. In this precise case, it will act like the MOESI Owned mode, namely it will be alone in responding to requests so as to reduce traffic.

In the second case however, MESIF contributes nothing. Although on paper MESIF may not seem as attractive, as always in computing, itís a question of striking the best compromise: an MOESI implementation can be more complex than an MESIF.

What about with a multi-CPU system?

Here we have described a relatively simple situation where we only have a single processor with several cores with a cache system and a single memory controller. What happens in a modern multi-CPU system where each processor has its own controller and its own memory bars?

There are two possibilities. The simplest but not necessarily the most intuitive consists in duplicating data. Like with a RAID for hard drives, each memory controller stocks a copy of the data. The available memory is therefore divided into two on a bi-socket platform. With reads, itís easy. The cores of each processor have a local copy of the data that theyíre interested in. With writes, each change has to be carried forward simultaneously in all memory spaces.

NUMA

The optimum mode is still thought to be that of aggregating each processorís memory within a large common memory space that can be used as such by the operating system. From a theoretical point of view, processor A simply needs to be allowed to use the memory in processor B. This is what the two QPIs in the processor are for. In the case of the SNB-Es, these links are clocked at 4 GHz, which gives us 32 GB/s of usable bandwidth in each direction.

There are however two problems. The first is a practical one: with there being just a single memory space as viewed by the operating system, how is this space shared between the two sockets? The traditional method consists of mixing up the memory banks. This means that at any moment, an application will have half its data on each of the sockets, independently of the processor where it is executed.

The other possibility is to use an intelligent protocol which requires the collaboration of the operating system. This is whatís known as NUMA, for Non Uniform Memory Access. In NUMA mode, the operating system takes on board the fact that there are two distinct logical memory spaces, a bit like the way the kernel takes HyperThreading into account or the AMD FX architectures in the form of modules. The operating system will then allocate the memory in the socket which corresponds to the processor core that is executing the thread supposed to be using the memory. With the MESIF protocol we were discussing higher up the page (or MOESI for AMD), where an application shares the data between several threads, memory transfers will operate when necessary.

On paper, the NUMA mode seems to be the best but as is often the case things arenít as necessarily as simple as you might first think. First we compared latency and the multithreaded memory bandwidth using RMMT and Aida64. Note that for these theoretical tests, we turned off HyperThreading as well as four cores on each processor. The reason behind this limitation comes from the fact that RMMT doesnít support more than eight threads at once, a problem weíll come back to later. Eight bars of 4 GB of registered DDR3 memory clocked at 1066 (CAS7) were installed for these tests:


To recap, in mirroring mode only 16 GB are available. In NUMA Off and NUMA On modes, 32 GB are available, but if NUMA is off, the memory space is shared on both sockets. Note that quite logically, Mirroring mode is the least efficient in terms of memory writes. Each write completed is sent to both memory controllers at the same time, saturating the QPI busí 32 GB/s.

If we turn the mirroring off, the write bandwidth climbs again. Weíre still partially limited by the QPI bus but the fact that the local controller and the distant controller are used alternately mitigates the problems. Turning NUMA on allows us to maximise performance with a big gain in reads, writes and latency as each thread then uses the local memory of the socket on which it is executed.

Of course theoretical performance has to face some practical counter examples! We measured performance in 7-Zip, the value given being compression time in seconds:


7-Zip is slowest with NUMA on. This is of course a particular case. Here the software uses a dictionary of data that is common between all threads and access to this is shared. In this case NUMA can cause a slight loss in performance, which seems to be linked to the use of the MESIF coherency protocol. As always in computing, itís about strking the right compromise and while, for general usage turning NUMA on is always advisable, like HyperThreading in some cases, it can also be slightly counterproductive. Depending on the type of application being used, the memory controllers can be configured according to its needs. For the tests which follow we opted for the default configuration, which, 7-Zip excepted, is systematically the most effective.


Page 4
Xeon E5-2687W

Xeon E5-2687W

As we said in our Core i7 3960X test, the die developed by Intel for the LGA 2011 platform (SandyBridge-E) is made up of 2.27 billion transistors and eight cores. The size of the level 1 and 2 caches has been doubled on the standard Sandy Bridge processors, while the L3 is now 20 MB.


In the general consumer LGA 2011 range however, Intel's offer is confined to four to six core processors. This isnít of course the case with the Xeon range where eight core processors are marketed, often at lower running clocks so as to remain within a TDP of 130W (sometimes 135W), equivalent to that of the Core i7 3960X.


As we said in the introduction, today weíre looking at the most extreme model in the E5-2600 range, the Xeon E5-2687W which has a TDP of 150 Watts! All the dieís eight cores are active and while the chipís base clock is a little lower (only 3.1 GHz compared to 3.3 for the Core i7 3960 X), its maximum Turbo clock (one or two cores active) is 3.8 GHz.


The table below shows the maximum Turbo clock of the Xeon E5 2687W and the Core i7 3960X according to the number of active cores:


The difference in clock between the two chips is therefore generally 100 MHz, even if the distribution of clocks doesnít always follow the same pattern, with both processors clocked at 3.7 GHz when three cores are active. Another thing to note here with respect to Turbo and multiprocessors is that, unlike the Core i7, you canít change the Turbo ratio on these Xeons. The multiplier is also blocked on these chips.

The final particularity of these Xeons, and not the least, is of course the fact that two of its QPI strands are active, which, if youíre using a suitable motherboard, enables bi-socket mode.


Page 5
Asus Z9PE-D8 WS

Asus Z9PE-D8 WS

For this test, Asus supplied us with the Z9PE-D8 WS, one of their 'serverí range motherboards, though it is referred to as a Ďworkstationí type card. Itís equipped with the Intel C602 chipset, previously known under the name Patsburg-A. The first thing you notice is of course its size. Itís in the E-ATX format. At 30.5 cm, itís the same height as a traditional ATX motherboard. The width however increases from 24.5 to 33 centimetres. This increase in size is of course obligatory so as to be able to get both sockets and the associated memory in there. It goes without saying that such a card requires a casing that is specifically compatible with the EEB standard. A single large tower casing, as big as it might be, wonít be enough if it isnít explicitly EEB compatible.


There are two LGA 2011 sockets on the motherboard with the main socket on the right. To recap, while Intel used a QPI interconnect to link the processors to the chipset with Nehalem, this is no longer the case on the SNB-E platform as here the main socket is linked to the chipset by a DMI bus (a PCI Express 2.0 x4), like on the socket 1155 platforms.


Each socket is surrounded with four memory slots, with Asus limiting itself to one DIMM per channel on this model to contain the size of the card. Models with two DIMMs per channel (namely eight memory slots per socket, 16 in all on a bi-socket motherboard) also exist. Note that for the power supply each socket has its own P8 type power supply connector. While the big power supplies generally have two connectors of this type, this isnít necessarily the case on all models. You can however buy 6-pin PCI Express to P8 power supply connector adaptors.


On the storage side, like the X79, the C602 chipset has six Serial ATA connectors, two 6 Gb/s and four 3 Gb/s ones. The SCU (Storage Controller Unit), an additional block added to the chipset, adds four additional ports. In practice these ports are therefore particularly limited. To compensate for this, Asus has added a Marvell 9230 controller with four 6 Gb/s ports. A particularity here is that the controller is connected to the chipset on two PCI Express lanes and not one.


There are seven PCI Express ports on the motherboard and they're configured in a particular way. To recap, SNB-E processors each have 40 PCI Express 3.0 lanes. Here the first four slots are linked to the main socket and the next three to the second socket. With two processors you therefore have access to the four blue slots of four times 16 PCIe 3.0 lanes. If all the slots are used, they're limited to x8 mode.

Back panel

The back panel looks more standard with a PS/2 connector and six USB 2.0 ports. There are also two Gigabit Ethernet ports, each running on an Intel 82574L gigabit controller.


One of the particularities of this controller is the VT-d support, as long as you have the appropriate drivers. For the rest itís more standard with two USB 3.0 ports running on an ASMedia 1042 controller as well as six assignable jacks and an S/PDIF connector running on a Realtek ALC 898 (with a DTS encoding option). While these features may seem standard on a desktop motherboard, this isnít necessarily the case for workstation or server boards.


Internal connectivity

In terms of internal connectivity you can connect two additional USB 3.0 ports (ASMedia 1042), six USB 2.0 ports, two Firewire headers and two connectors for series ports. More originally, thereís also a VGA header on the motherboard, running off an ARM SOC Aspeed AST2300 which serves as a basic 2D controller for a server room screen and allows remote control without requiring an additional graphics card.


There are two processor fan connectors and six chassis fan connectors, all in 4-pin format, as well as debugging leds and power/reset switches, which can come in very handy.

Bundle

Finally the Asus bundle includes no fewer than fourteen (!) Serial ATA cables, SLI bridges, Tri SLI and Quad SLI, two brackets for COM ports, a bracket with two USB 2.0 ports and a Firewire port as well as a thick manual which covers the workings of LSI MegaRaid, pretty extensively though this isnít actually included, and other particularities of the board. For the rest, the manual is similar to the Asus consumer manuals.

Note that thereís no USB 3.0 bracket in the bundle. It would also have been nice to see an adaptor for the VGA header.


Page 6
BIOS/UEFI, software

BIOS/UEFI

The motherboardís server origins can be seen in the BIOS interface, which, while a UEFI BIOS, still has a text interface. There are numerous settings and while the interface is coherent overall, we did however note certain details which show an approach that is midway between a server product and a consumer one. Note for information that the boot time of the platform was relatively long. We measured it at 28 seconds between switching it on from cold and launching the operating system (limiting the POST report to 1 second). Most of this time is used detecting the processors and configuring the memory. It would take more than twenty seconds for a full configuration not including getting to a screen image. While this doesnít compare well to general consumer platforms, itís quite fast for a motherboard of this type.


As with general consumer cards, thereís a version of the AI Tweaker menu which allows overclocking. It is however rather minimal and using manual mode allows you to change the clock frequency and processor ratio. Note however that although BCLK overclocking is on, you canít chose the multiplier in the BIOS as you could on the X79 platforms (see our test here). While you will be able to change the processor ratio on an unblocked Core i7 processor, this isnít the case for a Xeon. This type of platform hasnít been designed for overclocking by Intel and while we don't disapprove of the efforts made by Asus, in practice you won't want to overclock it.

This doesnít however mean that the AI Tweaker menu is entirely pointless. Apart from the processor voltages (Vcore and Vuncore), there are four distinct memory voltages. The ABCD channels are the four channels of the first processor and EFGH those of the second. This flexibility is particularly welcome and allowed us to boot some rather esoteric memory configurations. We were thus able to mix these 4 GB bars on each socket at the same time:
  • Two G.Skill bars, XMP 1.65V 2133 MHz, 9-11-9 (SPD 1.5V 1600 MHz, 11-11-11)
  • An AMD Memory/Patriot SPD 1.5V 1600 MHz bar at 9-9-9
  • A Kingston XMP 1.65V 1600 MHz bar at 9-9-9 (SPD 1.5V 1333 MHz, 9-9-9)

Without any particular setting, this set-up starts at 1333 MHz 9-9-9, but we managed to force 1600 MHz 9-9-9 without any problem. The Intel memory controllers are relatively flexible and the fact that there are individual voltage settings helps though it doesnít guarantee that all configurations are possible! Note that while you do get the timings settings, the memory clock setting isn't in this tabÖ





You have to go into the advanced menu to find the long (!) list of menus. The memory speed is hidden in the chipset settings and this is also where memory controller modes are chosen (mirroring or independent and whether or not NUMA is on). When it comes to originalities, note an SCU SAS option which can pilot an LSI Megaraid controller, which is available as an option and which transforms the four SCU ports into SAS ports. Note also support for WHEA, a Microsoft error report protocol, as well as being able to restart a ROM on any PCI Express slot (to make it possible to boot a PCI Express SSD or an additional RAID card). Another particularity is the log page showing hardware dysfunctions. Theyíre most useful in remote management situations to monitor any problems that come up.



Otherwise the options are standard though the BIOS, EZ Update update tool is slightly original with its consumer Asus BIOS skin.

Software

Apart from the drivers, the Asus software offer consists of just two server management utilities: a network monitoring tool and ASWM Enterprise, a suite of tools designed for remote management of the platform and which has numerous prerequisites, even to function locally (the web IIS server and SQL server to mention just two). In spite of installing the prerequisites, the ASWM installation file wouldnít install on our Windows 7 platform. In itself this isnít necessarily a problem for general consumer/workstation usage and applications such as hwinfo64 gave us access to monitoring. If Asus had supplied a utility that was a little less constrictive however it wouldnít necessarily have been a bad thing.


Page 7
Configuration, energy consuption, memory performance

Configuration

We measured the performance of this platform with three different processor configurations:
  • 2x Xeon E5-2687W
  • 1x Xeon E5-2687W
  • 1x Core i7 3960X


Otherwise our configuration was as follows:
  • Asus Z9PE-D8 WS motherboard
  • 8 x 4 GB DDR3 1600 9-9-9
  • Corsair F120 SSD (system)
  • SSD OCZ Vertex 3 MaxIOPS (benchmarks)
  • Radeon HD 6670
  • Corsair TX 850 power supply
  • Windows 7 64 bit SP1

In a mono-socket test, the quantity of RAM is halved of course, though with no impact on our benchmarks in practice. On the operating system side, itís important to note that Windows 7 supports platforms with up to two sockets. Beyond that, the server OS (Windows 2008 R2) is required. In practice the Windows 7 kernel supports NUMA natively and 2008 R2 wonít add anything to this type of platform in terms of processor performance.

Energy consumption

We also looked at the energy consumption at the socket in three scenarios: at idle, in load in Cinebench, in load in Prime95.


The energy consumption at idle of our Xeon E5 alone is equivalent to that of the Core i7 3960X. In load, note that the two additional cores push up energy consumption at the socket by a little more than 29 Watts in Prime95.

On the 2S platform, while energy consumption at idle is contained, in load we get up to around 500 Watts on the complete platform! With the registered memory/ECC, which is significantly more demanding in terms of energy, we were up to 541 Watts at the socket in Prime95!

Memory latency

We spent some time on the theoretical performance of the memory controllers and took the opportunity to look at the limitations of certain benchmarks. First we measured the memory latency via AIDA64:


Note here a very small advantage for the Xeon E5 over the Core i7, the most important reading being of course the latency measured in 2S (2 sockets) mode: using two processors simultaneously adds, in spite of NUMA, twenty milliseconds to the average latency. If youíve read the beginning of this article, this higher latency wont surprise you! It could however be a factor that affects the scalability of practical performances.

Multithreaded bandwidth

Lets finish our memory measurements with RMMT, the multithreaded benchmark included in Rightmark. So as to bypass the large L3 cache, the memory operations were carried out on 32 MB blocks for each core. As we said before, this benchmark is limited to eight threads, forcing affinity on the cores in a non-optimal fashion. We therefore limited each die to four cores so as to be able to use both controllers.


As expected, performance levels rocket thanks to NUMA and we were just a hairsbreadth off 90 GB/s of total read bandwidth! While memory bandwidth isnít always a limiting factor in performance for general consumer applications, donít forget that here there are 32 threads to supply. This bandwidth probably wonít go unusedÖ Enough with theoretical readings however, let's move onto the practical ones (finally!).


Page 8
Performance in Cinebench, Visual Studio, 7-Zip

Given that we want first and foremost to examine the multithreaded performance of these Xeons, we excluded tests of applications with few threads (such as games) as they wouldn't really benefit from the sort of increase in the number of cores we have here. Remaining in the spirit of this article, we didn't use specific benchmark servers either, the main thing being to give us a glimpse of the potential of such a platform on benchmarks where six-core Core i7s already do very well.

Cinebench R11.5


To measure 3D rendering performance, we used Cinebench in version R11.5. Remember, this application uses the Cinema 4D rendering engine.


The Xeon E5 2687W started well here with its two additional cores, giving 25% more than the Core i7 3960X. In 2S mode, scalability is excellent with a performance gain of 92.9%. Seeing the 32 threads take a bite out of the rendering in Cinebench makes quite an impression.

Visual Studio 2011 beta


We used the beta 2011 version of Visual Studio. We compiled the latest version (1.7.4) of the source code of the 3D Obre engine (examples included). Parallel compilation was activated for each project in VS.


Again the eight-core Xeon gives an additional 20% over the Core i7. Adding a second processor brings a relatively smaller gain of 36.6%. Not all the operations carried out by VS are sufficiently multithreaded.

7-Zip 9.20


We used version 9.20 of 7-Zip to compress a large volume of files using the LZMA2 algorithm.


Adding two additional cores gave a gain of just 6%. Adding a second socket gave a much more impressive 48.9% gain, in spite of the fact that the NUMA memory mode doesnít have as much of an impact on this benchmark as we saw earlier.


Page 9
Performance in Staxrip - x264, Bibble

Staxrip - x264 b2197


Moving frontend, we used Staxrip to transcode a scene from Avatar via x264 in build 2197. We carried out a medium type 2 pass encoding on a 720p source, re-encoded at a bitrate of 6 Mbits/s. Remember, the second pass is the one that benefits most from multithreading.


[ Pass 1 ]  [ Pass 2 ]

While the gains on the first pass were negligeable, they were much more marked on the second. The Xeon gives a 25% gain over the Core i7 while adding a second socket gives another 64%.

Bibble 5.2.3


We processed a lot of 48 RAW photo files, exported as JPEGs.


While the Xeon gives a gain of around 24% once again, Bibble fully benefits from the second socket with a performance gain of 84.5%!


Page 10
Conclusion

Itís not surprising to see the enormous impact of using the Xeon E5-2687W in multithreaded applications. With an additional two cores, it often outdoes the most extreme of the Core i7s, the 3960X, no slouch in this type of application. This Xeon has therefore made us rather wistful as to what the Core i7 LGA 2011 range could have been but has also got us excited about the possiblility of Intel bringing out such a Ďgeneral consumer' chip.


Obviously, when two are used at once, performance levels rocket in the applications which use multithreading correctly and while the mechanisms for sharing memory do have a cost, gains of 80% or more arenít unusual as long as applications are correctly written.

A special mention goes to Asus who have tried to offer a modern interpretation of a Ďworkstationí motherboard, usually a very conservative product where banalities such as being able to set voltages and the inclusion of an audio codec or even an S/PDIF port often arenĎt included. The BIOS with its relatively fast boot is nice and while the server origins of the board are sometimes regrettable (the VGA connector for example, or the software offering), the compromise struck with this one is ideal for those hoping to build such a configuration.


We would however perhaps do better to speak of dreams rather than hopes as beyond the curiosity that led us to take a closer look at the fastest of the Xeon E5s, the question of pricing rapidly brings you back to earth. The Xeon E5-2687Ws are on sale at $1885 each (1000 units). The Asus motherboard comes in at a comparatively reasonable Ä495! Thankfully, the right to dream is still freeÖ


Copyright © 1997-2014 BeHardware. All rights reserved.