Home  |  News  |  Reviews  | About Search :  HardWare.fr 



  Processors

  Motherboards

  Graphics Cards

  Multimedia

  Storage

  Imaging

  Monitors

  Miscellaneous
Advertise on BeHardware.com
Review index:
Intel Core 2 Duo - Test
by Franck Delattre et Marc Prieur
Published on July 4, 2006

IPC and frequency
The CPU performances can be evaluated with the number of instructions processed in one second, in other words the IPS. It is equal to :

ips = i/c x c/s

C corresponds to the number of processor cycles and IPC to the average number of instructions processed per cycles. The cycletime is the number of cycles per second, or in other words the clock frequency, called F.

So:

IPS = IPC x F

This simple formula shows that the IPC and frequency are the two main performance factors. They are intimately connected to processor architecture and especially to the depth of the processing pipeline.
Let´s consider for example a processor where the fastest instruction is processed in 10 ns. If it uses a processing pipeline made of 10 stages, one stage is processed in 1 ns (10 ns / 10 stages) and it corresponds to the minimum time cycle. The maximum reachable frequency is the opposite of this cycletime, or 1 GHz. If the pipeline includes 20 stages, the cycle time is 0.5 ns (10 ns / 20 stages) and the maximum frequency 2 GHz. The maximum running frequency increases with the depth of the pipeline.

IPC is data that is intrinsic to a processor’s architecture. It depends, amongst other things, on the capacity of calculation units. For example, if the processor has a single processing unit for additions, it will be able to provide a maximum of one addition per cycle. If it includes two, it may be able to process two additions in one cycle. We say, "may" because the optimum scenario implies that processing pipelines provide a constant and maximum transfer rate. In practice, the instruction flow processed by the pipeline includes factors that make the pipeline wait, which interrupt the transfer rate and tend to reduce IPC. There are especially two types of factors that reduce the pipeline performances: branching and memory access.

Let´s take the case of a processor which has two calculations units for integers and a maximum IPC of 2 on these instructions. We also add a subsystem that has a success rate of 98% and central memory that has an access time of 70 ns.

X86 code has approximately 20% of its instructions that access memory. Amongst these, 98% will find the data in the cache subsystem and 2% will have to be accessed in central memory. We suppose that for the remaining 80% of the code and 98% that successfully accesses cache, the processor can provide a maximum IPC of 2. This represents 0.5 cycles per instruction. The number of average cycles per instructions is:

CPI = 20% x (98% x 0.5 + 2% x M) + 80% x 0.5

M represents the access time to central memory in cycles.
  • with a 10 stage pipeline, memory access requires 70 cycles at 1 GHZ. The CPI ratio is 0.778 and it corresponds to an average IPC of 1.28 or 64% of the maximum theoretical IPC.
  • with a pipeline of 20 stages, the only difference is the memory access time in cycles. At 2 GHz, 70ns correspond to 140 cycles. In this case CPI = 1.06. The average IPC is 0.95 or 47% of the theoretical IPC.
  • Branching has a slightly lower impact but it also depends of the depth of the pipeline. Indeed, in the case of inaccurate branching prediction the content of the pipeline is incorrect, because it includes instructions of the wrong branch. The penalty is equivalent in cycles to the depth of the pipeline. If we assume that there will be 10% branching instructions with a success rate of the branching mechanism of 96% the result is:

    CPI = 10% x (96% x 0.5 + 4% x P) + 90% x 0,5


    P is the pipeline depth.
  • with a 10 stage pipeline, the result is CPI = 0,538. The IPC is 1.85 (92,5% of the theoretical IPC).
  • with a 20 stage pipeline, the result is CPI = 0.578. The IPC is 1.74 (87% of the theoretical IPC).

  • The IPC that results from penalties due to branching and memory accesses falls to 1.19 for the 10 stage pipeline and 0.82 for the 20 stage pipeline. What interests us is not the IPC itself, but the result of multiplication by the frequency. This will give us the number of instructions processed each second.


    We see that the maximum frequency allowed by a 20 stage pipeline compensates for the reduction in IPC. In the end, the 20 stage pipeline is as fast as the 10 stage version. This was the reason why Intel opted for long pipelines and made this its new philosophy and Netburst was born.

    << Previous page
    Introduction

    Page index
    1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16
    Next page >>
    The plan and the problems of Netburst  




    Copyright © 1997- Hardware.fr SARL. All rights reserved.
    Read our privacy guidelines.