Nehalem and the future

You may recall, the Nehalem core is a light evolution of the Penryn core, with the return of multi-threading, various small improvements and the addition of some new SSE instructions. This is on the "uncore" side, a term used by Intel to describe the processor structure, that there will be something new. Roughly, we can say that Nehalem gets its inspiration from the structure of the Athlon 64 and other Phenoms. Thus, the memory controller makes its way into the CPU and connectivity evolves from an antique FSB to a more modern bus, QPI (Quick Path Interconnect). Nehalem also gains in terms of modularity. Thus, the cores, memory controllers, L3 cache, QPI lanes, PCI Express lanes, and integrated graphic core are all modular elements. In its first incarnation, the Bloomfield, Nehalem will have 4 cores equipped with 64 KB of L1 cache and 256 KB of L2 cache which share 8 MB of L3 cache and a 3-way memory controller. In other words, a Core 2 which will no longer leave any structural advantage to AMD’s Phenom.



Nehalem was omnipresent and functional at the expo. Samples ran at a frequency of 3.2 GHz, but real performances could not yet be measured. In terms of the chipset, it will be reduced to the strict minimum given the integration of the memory controller. The northbridge, IOH HEDT Tylersburg, is nothing more than a QPI / PCI Express 2.0 converter which provides one 4x port and two 16x ports that can be divided up into four 8x ports using switches on the motherboard. The southbridge will be the ICH10, which compared to the ICH9 doesn’t change much except for the fact that it is more environmentally friendly as it no longer contains halogen gas.


And the future? Concerning the post Nehalem, we didn’t learn too much more. New architecture, Sandy Bridge, will arrive in 2010 and will add a new instruction set, the AVX, destined to replace SSE. Contrary to the latter, which functions with 128 bit registers, AVX will rely on 256 bit registers and will thus enable processing larger vectors and added flexibility. Intel specified at this IDF that the first implementation of AVX will be done with "full performance" conditions. This means that Sandy Bridge will have 256 bit vectorial calculation units when in the beginning 128 bit SSE was processed in several cycles by 64 bit units.

Finally, Intel indicated that the next generation arriving in roughly 2012 will integrate FMA, Fused Multiply Add, or the MAD which is often a topic when we talk about GPUs. It’s actually the core calculation unit of a GPU which gives the advantage of being able to process two operations at a time. This is a combination that is often useful in graphic computing. Of course, its integration to the heart of CPU processing units is an example of the convergence of the two architectures.
The mobile platform
While Intel obviously referred to its future Montevina mobile platform, or the "first version" of the Centrino 2 which will arrive in the second half of the year and that will mainly add the use of a new chipset with a class G45 graphic core, they also briefly mentioned the Calpbella. This 2009 mobile platform will be based on an entirely new architecture because it will be built around a CPU derived from Nehalem architecture that should normally be a dual core CPU integrating a graphic core, memory controller and PCI Express support.
Besides the gain in performance, we can easily imagine that on the energy level this solution should add significant advantages; however, Intel didn’t mention anything concrete in this area about its platform.


