CompilersWe used three different compilation environments to complete this report: Visual Studio 2010 SP1, Intel C++ Compiler XE 12.0u5, and TDM-GCC (MinGW/GCC 4.6.1).
Visual Studio 2010 SP1It's no surprise to see that the Microsoft development environment is the most commonly used environment in Windows. In terms of optimisations, note that while it does provide for the generation of SSE, SSE2 and AVX code (only by the addition of a switch in the command line for AVX as the option isn’t available in the Visual Studio 2010 interface), it doesn’t offer automatic vectorisation. Nor does it provide an automatic dispatcher.
The Visual Studio compiler (we’ll call it 'cl' from now on, the name of its executable) is by far the most pernickety in terms of what it can compile. The (very) partial implementation of C/C++ standards poses a certain number of problems of interoperability with other compilers. While there are many editions that you have to pay for, Microsoft has also been offering a free version of Visual C++, known as Express, over the last few years. It’s available for download
on Microsoft’s site.
Intel C++ Compiler XE 12.0u5
Intel also has its own Windows compiler. It's partly based on components created by Edison Design Group
and then extensively customised by Intel. From a practical point of view, it has the particularity of integrating easily with Visual Studio and allowing easy project conversion. You can moreover move from one compiler to another at any moment, which is a very good argument to convince developers to try it.
ICC is extremely rich in terms of optimisations, offering among other things automatic vectorisation. It can also generate targeted builds for different levels of processor functionalities (the
QxSSE2, 3, 4.1, 4.2, AVX options...
) as well as creating a dispatcher version, though only for a given level. The
option will for example get the AVX version of its code to run on compatible AVX processors and a basic version (SSE2) on all other processors.
You can see why Intel has done this: a program compiled with the
option would run on a Sandy Bridge processor with code optimally generated for an AVX processor (including code generated for strings and memory) and in SSE2 mode on a Core i7 “Nehalem”. If we were being provocative, we might say that this helps create generational gaps for certain benchmarks presented by the constructor in some of its presentations.
Using the Intel compiler in Visual Studio is only a click away
The other issue with these options is that in contrast to what you might think given what they're called, these options get the Intel compiler to check for the processor brand. Moreover this was one of the issues that came up in the FCC inquiry of Intel practices. One of the (known) consequences of the AMD/Intel/FCC agreement is that the Intel documentation is now packed with warnings to the effect that "non Intel" processors may receive different treatment than Intel processors, though without any further details. We’re going to try and ascertain in practice whether Intel has changed its practices or not. ICC also has the reputation of being the highest performance compiler and this is something we’re going to verify!
Note that ICC includes a third optimisation mode (
for example) allowing it to create a build for a given level of functionality and not only a given Intel processor. The documentation for the ICC version that we used only indicated SSE4.1 support as a max for this option, with SSE4.2 and AVX not showing. However
parameters can indeed be used. Is this an ideal solution? Not necessarily in practice as we’ll soon see.
Intel charges for ICC and it is available in numerous editions
. A trial version is also available.
TDM-GCC (MinGW/GCC 4.6.1)
From the open source sector, GCC is a compiler that historically tended towards universality. It can therefore be used on all architectures (though this doesn’t mean that the same code is compilable everywhere and subtle differences particularly to do with memory management often pose a problem when you try to generate cross-compatible code, for example for both ARM and x86) and is available for almost all operating systems.
GCC requires a development environment for it to run in Windows. Two main ones exist: Cygwin and MinGW. Cygwin offers a full POSIX development environment which allows you to compile a Unix program and get it to run in Windows. This is a worthwhile implementation but does come with a performance cost. These days open source applications in Windows mainly use MinGW, a minimalist environment for Windows which serves as a bridge between GCC and the OS, notably by giving access to certain Microsoft DLL systems such as the notorious msvcrt.dll, MS Visual C++ Runtime
which caused so many problems in Windows 95 and 98.
Among other things, this DLL offers implementations for standard C/C++ functionalities (manipulation of strings and memory space). For compilation of programs in C only, these Microsoft routines will be the ones used by the program. As the DLL is very old (1998), its implementations are outdated with respect to modern processors, which seriously impacts on C programs that are compiled with it. There’s no such problem with C++ as GCC has a standard library for these functions. We’ll have to keep this issue in mind when we evaluate performance. Note lastly that while GCC was originally designed with universality in mind – earning it a longstanding reputation for slower performance – developers have been betting more heavily on performance for some time. Many optimisations have therefore been introduced, from automatic vectorisation to the generation of SSE2 maths (AVX is partially supported) as well as profiles for a large number of x86 architectures. For these tests we used the TDM-GCC
version that is more up to date than the orginal. It includes GCC in version 4.6.1.
Unlike Intel, AMD doesn’t develop its own compiler. This doesn’t however mean that it isn't working on the subject. First of all, AMD (like Intel) participates in the development of GCC. Microsoft also works actively with AMD and Intel to obtain coherent (and unbiased, according to them) support in their compilers, as well as for languages (.NET) where optimisations also exist for both processor brands. Finally AMD sponsors and distributes an Open64 fork
. Open64 (partly) came out of a research project on compilers - financed by Intel and targeting its Itanium architecture. Itaniums have the particularity of using a VLIW instruction set. In reality, each instruction contains three, which are processed by a group of three scalar units. It's therefore over to the compiler, with Itanium, to choose which instructions to mix to obtain maximum performance, which is a particularly difficult task.
While Intel is no longer working on this, AMD is still offering an alternative version of Open64. Unfortunately it’s only available in Linux
which makes it of limited interest for our article.