Home  |  News  |  Reviews  | About Search :  HardWare.fr 



  Processors

  Motherboards

  Graphic Cards

  Multimedia

  Storage

  Imaging

  Monitors

  Miscellaneous
Advertise on BeHardware.com
Review index:
Preview : Ageia PhysX
by Damien Triolet
Published on May 22, 2006

The PPU
The first chip devoted to physics, Ageia’s PPU remains mysterious because of the rare technical details about it. We know that the chip has 125 millions transistors on a surface area of +/- 190 mm². Ageia vaguely speaks of 20 billion instructions per second, which represents 530 million sphere to sphere collisions (the most simple) or 533 000 collisions between convex objects (the most complex) per second.

The PhysX chip has a PCI interface and a 128 bit bus memory. These choices are rather old-fashioned because the PCI bus is slowly disappearing and progressively being replaced by PCI Express. The memory bus corresponds to what is found on a mid-range graphic card but here it’s combined to a rather slow DDR at 336 MHz. This choice is probably cost based. The dedicated resulting bandwidth is 10.9 GB/s.

Ageia defined 4 goals for the PhysX processor; Scale, Fidelity, Interaction and Sophistication. In other words, increasing the number of physics details, their realism and how they interact altogether. The fourth point is the usual marketing hole that is regularly found in 3D, the “Hollywood” quality.


To reach these goals, the PhysX processor is equipped with a great number of calculation units of different types. There are scalars for whole number and vectorial for floating points. The first should mainly be used to handle flow control or everything that modifies instruction flow such as branching etc. These units are organised in independent groups but work internally like SIMD units. Each processing unit processes the same task, but each group can work on a different program.

To know more, the only solution would be to read Ageia’s patents. Several different approaches are defined at the design level, however, and they can be extended via the addition of additional processing units. We can’t know for sure if they represent the final design of the PhysX chip or simply an example. The final product could have been adapted in terms of the number of each unit. The probable design found in the patents mentions four independent blocs of calculations; Vector Processing Engines that each include four Vector Processing Units, which are 4x4 calculation units. Each features 6 floating MAD (capable of processing one multiplication and one addition) and a complete ALU. The total is 96 MADs compared to 56 for ATI and NVIDIA’s high end GPUs. Frequency is unknown! Is it 366 MHz, like the memory or 250 MHz or 500 MHz? We don’t know. So it’s difficult to compare the raw calculation power of this PPU and compare it to other chips. Of course, this would be only for informational purposes, because raw power has no interest if it isn’t used efficiently.


This is where the strength lies in Ageia’s chip. Physics calculations have very different proprieties compared to 3D calculations. When pixels are calculated, they are independent from one another and memory accesses are in the majority of cases aligned in an optimum manner. With physics, objects interact with one another. In other words, we can’t know the position of one without knowing the position of others, because they might collide and change trajectory. Because the results of other units can’t be known instantly, an important amount of small threads are used as ATI does in order to mask latency.

This functioning mode leads to a massive displacement of data between the different PPU calculation units, but also to less predicable memory reading and writing. For those reasons, Ageia has designed a highly developed Data Movement Engine (DME) that contains 5 Memory Control Unit (MCU). Four of them control date transfer from and to each of the four VPE via a bus on which each VPU is connected. Each VPU has a small memory which it can access itself of course, and also which the DME can access. This works like a double buffer. VPU access is the first buffer while DME access is the second. As soon as the first read /write is over, buffers are reversed. This system makes it possible for the VPU and DME to access memory at the same time at full speed. The fifth MCU is connected to the PCE (PPU Control Engine), at the head of the PhysX processor.

Ageia speaks of an internal bidirectional bandwidth of 250 GB/s, which is very impressive. In the end, the DME seems to be the most important part of the PPU. Without it, calculation units couldn’t be correctly powered. It also handles access to dedicated memory via the Memory Interface Unit and the management of the PCI bus. Ageia specifies that this technology is capable of being interfaced in PCI Express, USB and FireWire. The design example discussed above corresponds to current implementation of the PPU based on the PCI bus. So in order to support PCI Express, Ageia will have to update the DME and manufacture a different chip.

In the end, this architecture reminds us of another processor, the Cell. It also has a main execution/ management core of a high number of specialised calculation units and advanced memory system. Comparing the two chips seems normal and we could say the PPU is a specialised Cell in physics.

It is possible that Ageia deactivates some processing units to increase the yield. Because of the architecture, the only part to deactivate to significantly increase yield by its size would be a VOE. It is difficult to know if this is true, but this could be confirmed by the fact that Ageia says in the SDK that the PPU can only support three physics scene at a time. Of course, it’s possible that this limitation could have another source other than the number of active VPEs.



<< Previous page
Introduction, physics engines

Page index
1 | 2 | 3 | 4 | 5 | 6
Next page >>
Card, drivers  




Copyright © 1997- Hardware.fr SARL. All rights reserved.
Read our privacy guidelines.