In keeping with all 'Ticks', Ivy Bridge architecture is broadly based on its predecessor, Sandy Bridge. We therefore refer you to our previous article on this for the details. Today we’re going to concentrate mainly on the differences and innovations introduced with Ivy Bridge.
Numerous points in common
Seen from the top, the technical choices made by Intel for Sandy Bridge have been confirmed for Ivy Bridge. The first of these was the inclusion of what was historically known as the northbridge on the processor die.
Traditionally part of motherboards, this part of the chipset contained the memory controllers, PCI Express and, if there was one, the IGP. It is implemented on a single die in Ivy Bridge, with two or four CPU cores depending on the die model, a level 3 cache, the LLC, that can be up to 8 MB in size, a graphics core and the uncore part, which includes the DDR3 memory controller, control of displays, the southbridge link (via a DMI bus, which is the equivalent of a PCIe x4 bus), as well as the PCI Express x16 controller. All these blocks are linked by an internal ring type bus, which enables the sharing of the LLC between the x86 cores and the graphics core.
There are relatively few changes inside the cores themselves. There's no new AVX-style extension to the instruction set such as was introduced with Sandy Bridge (AVX2 will be ushered in with Haswell next year) but there have nevertheless been a few small changes.
First of all Intel has added a few instructions to convert 32-bit single precision floating point type data into a compressed Float16 format (1 sign bit, 5 exponent bits, 10 significand bits). These instructions (VCVTPH2PS and VCVTPS2PH) are available in 128-bit and 256-bit SSE/AVX vector variants. Note also in passing the introduction of new instructions that allow FS/GS segments, which are usually reserved for the operating system, to be read.
There is however a digital random number generator. This is known as a digital generator as the chip includes a source of entropy (the purely random part that certain enchryption tools simulate in the generation of keys requiring you to move your mouse around in all directions). Here Intel gives a speed of 2 to 3 Gb/s, which should provide good performance for the applications that require it. All this is contained in a functional block that is accessed with an instruction (RDRAND) that will thus be able to supply a random 16, 32 or 64-bit number on demand (conforming to ANSI X9.82, NIST SP800-90 and NIST FIPS 140-2/3 level 2).
Note finally a last change with respect to the instruction set, where there is improved performance for instructions that handle strings – optimisation of REP MOVSB and REP STOSB with improved speeds for memory blocks of more than 64 bytes. Intel says that it would like to delete the algorithms specific to each processor that are found in the libraries used by compilers or runtimes. This is an interesting step for the future (these algorithms that are optimised by the processor are often only partially optimised, creating performance differences that could be avoided – we refer you to our report on the subject
), if indeed it is pursued going forward and if AMD also goes down this route. Up until now the REP MOVSB/STOSB instructions haven’t been the preferred route for these operations on K10 and later processors.
In addition to these changes in the ISA itself, Intel has also introduced some small optimisations in the pipeline to improve the IPC. Intel has thus improved the performance of divisions instructions and added improvements to detect and delete useless MOVs. Some changes with an impact on the IPC are not however necessarily included in the pipeline itself but directly in the uncore.
Improvements to the uncore
When it comes to latency, we noted progress both in respect of the LLC cache and the memory. Thus at 4.5 GHz, we noted 4.3ns for Sandy Bridge against 3.4 for Ivy Bridge.
With DDR3-1600 9-9-9 memory, the latency measured via AIDA64 drops from 45.1ns to 39.3ns.
Apart from its speed, LLC caching has also been reworked with the introduction of a process known as Adaptive Fill Policy, which impacts particularly on how the IGP and x86 codes share the common resource of the LLC. Intel says that it has worked on Sandy Bridge heuristics to optimise Ivy Bridge performance. This could reduce the effects of collisions that we noted at launch of Sandy Bridge when applications that tax both the x86 cores and the IGP are being used. We’ll check this in practice.
The LRU algorithm, which ages data in the cache, has become a bit more flexible, moving to two bits to improve granularity. The prefetch memory unit also now has a throttling mechanism which stops it hogging memory bandwidth when there are already too many memory accesses.
Note finally two additional changes in the uncore. The first concerns power-gating at processor level for the VccP used for DDR I/Os. When the highest C-states are being used (C3 package and higher), DDR I/Os are reduced. Intel hopes to obtain an energy economy of 100mW when the machine is at idle, which could be important on mobile platforms. Intel has made two changes with respect to the memory controller. The first is support for low voltage DDR3 on mobile Ivy Bridge versions and the second is “official" support for DDR3-1600 memory with two memory bars per channel. Remember of course, in our last review we used four bar DDR3 2133 motherboards with a Sandy Bridge processor without any issues.
In orange, you can see the changes made to the original controller (green). The changes in blue are to the buffers.
The last change concerns support for PCI Express 3.0, which has been added. In spite of what you might think, we haven't got the same implementation here as used by Intel for SNB-E (the X79 platform
). Instead, it's based on the original Sandy Bridge one. Only the buffers developed for SNB-E have been used for the new implementation, which, according to the Intel engineers, should significantly improve performance in comparison to the SNB-E PCI Express 3.0 platform. If you remember, the raw performance figures that we measured at the time
weren’t particularly great.
Variable TDP technology on mobile versions
Among the more unusual changes, variable TDPs have been introduced for mobile machines. In addition to a nominal TDP, there are also two distinct settings, TDP down and TDP up. The first can for example only be activated when the machine is plugged into a power source while the second represents a maximum energy economy mode. Intel hasn’t given many details on implementation yet and seems to have been introduced above all to give a bit more flexibility than Speedstep, which was the technology used for this up until now. Implementation of the TDP up/down modes will be at the discretion of partners and we’ll have to wait for the arrival of the first Ivy Bridge mobile CPUs to find out more.