On its introduction in October 2011, Zambezi, an AM3+ rollout of Bulldozer architecture, disappointed on several fronts. One year on AMD is back with Piledriver, a development of its CMT architecture, for a new AM3+ processor, Vishera. In this report, we’ll be analysing the performance of the flagship processor from this new range, the FX-8350. Does it mark AMD’s return to form in the world of CPUs?
CMT, high clocks and Piledriver
Back in May 2011, we devoted a report to the Bulldozer architecture
. The Piledriver and Bulldozer architectures have a common base, Cluster Multi-threading (CMT) technology. This technology means an 8-core processor is in fact made up of 4 modules. Within a module, the two cores share a certain number of components:
- the front-end which groups the fetch unit and instruction decoding as well as the L1 instruction cache which is supplied by these units;
- the floating point unit;
- the L2 cache.
AMD claims 80% of the performance of two full cores for major efficiencies in terms of silicon area and energy consumption. Many other changes were also made, both to the processing units themselves and the memory sub-system, in particular to allow the architecture to clock higher.
In terms of x86 instruction sets Bulldozer already supported the latest versions of SSE4 (4.1 and 4.2), AES-NI instructions enabling acceleration of encryption and AVX, introduced by Intel with Sandy Bridge, and its 256-bit variants. In addition, it added its own instructions, grouped under the names XOP, FMA4 and CVT16. XOP operates mainly on integer operands, FMA4 on 128-bit floating point numbers and CVT16 groups high precision floating point conversion instructions to medium and low precision floating points. FMA4, which allows the processing of a multiplication and addition in a single cycle, should among other things enable gains when used by applications.
Piledriver adds FMA3 (Fused Multiply/Add on 3 operands, a = a * b +c ) in addition to FMA4 (a = b * c + d) which was already previously supported (Intel will use FMA3 as of Haswell), as well as the F16C (16/32 bit floating point conversion) instructions introduced by Intel in Ivy Bridge. For the rest, the changes are mostly small touches at all levels, with the branch prediction mechanisms and schedulers announced as more efficient and gains announced for divisions.