PCI Express 3.0: impact on performance
by Guillaume Louel
Published on February 24, 2012
Bandwidth, theoretical measurements
Before measuring the practical impact, we ran some theoretical tests to see if PCI Express 3.0 fulfills its promise in terms of speeds. Here we used a performance test that is included in AMD’s APP development kit available in version 2.6.
Paged pool memory, non-paged pool memory
This first test, you may remember, is fairly particular in the sense that it attempts to achieve the highest possible transfer speeds using what is known as non-paged pool memory. In effect, on the system side, the tool reserves memory pages so that they can’t be moved. In practice this means that we can be 100% certain, throughout the length of the execution of the programme, that the memory pages will be physically situated in the RAM and never in a swap file.
While this may seem unimportant in theory on a test machine equipped with 16 GB of RAM, in practice this isn't the case. Of course, the data transferred will remain in the physical memory, but the possibility that it might not be calls for additional burden with respect to memory copying operations. For non-paged pool memory here, AMD uses algorithms optimised to make the most of PCI Express (just like NVIDIA in CUDA).
For this test and the following tests, we measured six distinct cases:
- PCI Express 3.0 x16, x8 and x4
- PCI Express 2.0 x16 and x8
- PCI Express 1.0 x16
From a theoretical point of view, some modes have an equivalent bandwidth:
- PCI Express 3.0 x16
- PCI Express 3.0 x8 and PCI Express 2.0 x16
- PCI Express 3.0 x4, PCI Express 2.0 x8 and PCI Express 1.0 x16
For games and applications that allow it, we also give the results in each of these cases in CrossFire mode.
Theoretical bandwidth (non-paged memory)
We independently measured the transfer rate from the CPU to the GPU (typical case in games), as well as in the opposite direction (also used in OpenCL).
Hold the mouse over the graph to view efficiency in comparison to theoretical bandwidth
There are several important points to note. Firstly, going from the GPU to CPU there’s a nice 77% increase in bandwidth between PCI Express 3.0 x16 and 2.0 x16 (compared to 89% between 2.0 x16 and 1.0 x16) but the gains are much smaller the other way around : only 50% and we remain under the 10 GB/s bar.
Another interesting point is the comparison between PCI Express 3.0 x8 and PCI Express 2.0 x16, two modes that theoretically have an identical bandwidth. While there’s a 2.5% fall going from the CPU to the GPU, there's a 5% gain in the other direction.
Performance levels at PCI-E 3.0 x4, 2.0 x8 and 1.0 x16 are similar overall.
Theoretical bandwidth (paged pool memory)
As developers can’t always use non-paged pool memory, we carried out a second test via the Cloo library (in version 0.9.1
). AMD’s Open CL driver is compatible with version 1.1 of the specification.
As we saw with non-paged pool memory, the gains are asymmetrical once again when compared with the gains given by PCI-E 3.0 x16 over 2.0 x16: 38% and 51% from the CPU to the GPU and the GPU to the CPU respectively. These scores are however relatively high, approaching (or exceeding when going from the GPU to the CPU) the level of performance of non-paged PCI Express 2.0 x16.
Comparing 3.0 x8 and 2.0 x16, we find an identical score for both modes going from the GPU to the CPU and a 4.2% gain in the other direction. Performance levels at PCI-E 3.0 x4, 2.0 x8 and 1.0 x16 are almost identical.
Let's now see what this translates to in the applications tests!
Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.