Preview : Ageia PhysX - BeHardware
>> Miscellaneous

Written by Damien Triolet

Published on May 5, 2006

URL: http://www.behardware.com/art/lire/622/


Page 1

Introduction, physics engines



10 years after 3dfx, Ageia prepares the release of a new accelerator, the PPU. Also developed for games, this PhysX Processing Unit will, as its name implies, manage physics effects in games or at least some of them.

With time, games have strongly evolved in their graphics and realism. This doesn’t only rely on visual effects, but also on how different elements of a scene interact. Cars no longer move like big blocks without weight and more and more objects are mobile and/or destructible. A good example is Half-Life 2 where the player has to use basic physics laws to progress in the game (for example, he manipulates hollow barrels to build a floating bridge). These aren’t the only type of physics effects and they have to do with everything that is mobile in games.

Physics effects in games concern mechanical aspects; how a body moves and reacts with contact or the way fluids behave. These are the main things controlled by game physics engines. If you hit a crate, it will fall of the shelf where it was stored unless it’s too heavy.


Physics engines
In the video game world, more and more developers use middleware. These are elements developed by third parties such as 3D engines, physics engine, etc. Their existence affects a developer’s load in the way that they can gain time by buying piece of code ready to be used. Another advantage is that the developers of these engines can spend more resources to increase quality. It’s undeniable that physics effects in games have quickly evolved during the past few years thanks to middleware and mainly thanks to Havok, an absolute must in this domain in equipping many games.

Havok isn’t the only one on this market as Ageia released the Novodex engine a little while ago. It has been renamed PhysX, like the name of the PPU. PhysX is a software engine available on computer and video game consoles and which of course is capable of handling the hardware acceleration of PhysX. The two can’t be mixed, however, because a game that uses the PhysX engine doesn’t necessarily support the PhysX card.


Currently, Ageia doesn’t provide hardware acceleration for competing engines and because of this competition between these two physics actors, chances are that they won’t find an agreement on this level. We will probably have to wait for the release of Direct Physics in DirectX for interoperability to happen. Even if Microsoft is working on it, this API is far from being ready and we don’t know if the current PhysX chip will be capable of supporting it. (We have heard rumours that Direct Physics would be 100% CPU).

The development of physics effects in games is still at an early stage and several paths may be taken. The evolution to multicores and in several years their widespread use will seriously increase CPU capacity for physics calculations. GPUs have also become capable of processing some physics calculations, those connected to what we call physics effects. It adds visual realism, but doesn’t affect gameplay. Havok supports this initiative and at the end of the year will release Havok FX, a plug-in for its engine that will use the GPU for these types of effects.

Of course, for Ageia the solution is to use a specifically dedicated processor in order to allow physics to explode in the future. For that to happen, games have to use it. The release of the PPU has been postponed several times as it was announced for the end of 2005, then in February, again at CeBIT and then just after. Today it actually happens. Two games are now available with PhysX support and several new demonstrations will be made in coming weeks at the E3.


Page 2
PPU

The PPU
The first chip devoted to physics, Ageia’s PPU remains mysterious because of the rare technical details about it. We know that the chip has 125 millions transistors on a surface area of +/- 190 mm². Ageia vaguely speaks of 20 billion instructions per second, which represents 530 million sphere to sphere collisions (the most simple) or 533 000 collisions between convex objects (the most complex) per second.

The PhysX chip has a PCI interface and a 128 bit bus memory. These choices are rather old-fashioned because the PCI bus is slowly disappearing and progressively being replaced by PCI Express. The memory bus corresponds to what is found on a mid-range graphic card but here it’s combined to a rather slow DDR at 336 MHz. This choice is probably cost based. The dedicated resulting bandwidth is 10.9 GB/s.

Ageia defined 4 goals for the PhysX processor; Scale, Fidelity, Interaction and Sophistication. In other words, increasing the number of physics details, their realism and how they interact altogether. The fourth point is the usual marketing hole that is regularly found in 3D, the “Hollywood” quality.


To reach these goals, the PhysX processor is equipped with a great number of calculation units of different types. There are scalars for whole number and vectorial for floating points. The first should mainly be used to handle flow control or everything that modifies instruction flow such as branching etc. These units are organised in independent groups but work internally like SIMD units. Each processing unit processes the same task, but each group can work on a different program.

To know more, the only solution would be to read Ageia’s patents. Several different approaches are defined at the design level, however, and they can be extended via the addition of additional processing units. We can’t know for sure if they represent the final design of the PhysX chip or simply an example. The final product could have been adapted in terms of the number of each unit. The probable design found in the patents mentions four independent blocs of calculations; Vector Processing Engines that each include four Vector Processing Units, which are 4x4 calculation units. Each features 6 floating MAD (capable of processing one multiplication and one addition) and a complete ALU. The total is 96 MADs compared to 56 for ATI and NVIDIA’s high end GPUs. Frequency is unknown! Is it 366 MHz, like the memory or 250 MHz or 500 MHz? We don’t know. So it’s difficult to compare the raw calculation power of this PPU and compare it to other chips. Of course, this would be only for informational purposes, because raw power has no interest if it isn’t used efficiently.


This is where the strength lies in Ageia’s chip. Physics calculations have very different proprieties compared to 3D calculations. When pixels are calculated, they are independent from one another and memory accesses are in the majority of cases aligned in an optimum manner. With physics, objects interact with one another. In other words, we can’t know the position of one without knowing the position of others, because they might collide and change trajectory. Because the results of other units can’t be known instantly, an important amount of small threads are used as ATI does in order to mask latency.

This functioning mode leads to a massive displacement of data between the different PPU calculation units, but also to less predicable memory reading and writing. For those reasons, Ageia has designed a highly developed Data Movement Engine (DME) that contains 5 Memory Control Unit (MCU). Four of them control date transfer from and to each of the four VPE via a bus on which each VPU is connected. Each VPU has a small memory which it can access itself of course, and also which the DME can access. This works like a double buffer. VPU access is the first buffer while DME access is the second. As soon as the first read /write is over, buffers are reversed. This system makes it possible for the VPU and DME to access memory at the same time at full speed. The fifth MCU is connected to the PCE (PPU Control Engine), at the head of the PhysX processor.

Ageia speaks of an internal bidirectional bandwidth of 250 GB/s, which is very impressive. In the end, the DME seems to be the most important part of the PPU. Without it, calculation units couldn’t be correctly powered. It also handles access to dedicated memory via the Memory Interface Unit and the management of the PCI bus. Ageia specifies that this technology is capable of being interfaced in PCI Express, USB and FireWire. The design example discussed above corresponds to current implementation of the PPU based on the PCI bus. So in order to support PCI Express, Ageia will have to update the DME and manufacture a different chip.

In the end, this architecture reminds us of another processor, the Cell. It also has a main execution/ management core of a high number of specialised calculation units and advanced memory system. Comparing the two chips seems normal and we could say the PPU is a specialised Cell in physics.

It is possible that Ageia deactivates some processing units to increase the yield. Because of the architecture, the only part to deactivate to significantly increase yield by its size would be a VOE. It is difficult to know if this is true, but this could be confirmed by the fact that Ageia says in the SDK that the PPU can only support three physics scene at a time. Of course, it’s possible that this limitation could have another source other than the number of active VPEs.




Page 3
Card, drivers

The card
Two manufacturers have already announced the building of cards based on PhysX processors, Asus and BFG. Asus rapidly sent us a card that we used for this test.



Asus’ card is announced to have 256 MB of memory as compared to 128 MB for BFG’s. Asus informed us, however, that the amount of memory was reviewed to 128 MB “for strategic reasons”. We tried to know more and we noticed that memory was no longer indicated in Ageia’s latest control panels. We were told that their cards (at least the first batches) will have 256 MB of memory but that the additional 128 MB wouldn’t be used. Ageia has probably decided not to complicate things by supporting several quantities of memory.

The card is equipped with a Molex power connector because the PCI bus isn’t enough to power it. Maximum consumption is announced at 28 watts, but without test truly capable of saturating the card (and/or show us this), it is hard to evaluate this consumption in practice. The price should be around 300€.


Drivers
PhysX drivers include two parts, the driver itself and the API. The driver, in 1.0.1.0 version isn’t the most important part as it doesn’t really evolve with new versions. The important element is the API.

PhysX isn’t a chip that is really programmable. Developers won’t be able to write some sort of physx shader. The API shows several functions that can be utilised in a fixed way, which it then transforms into programs that the PPU can execute. Changes brought to this API can cause problems since some fixed functions can disappear or be replaced by others and their behaviour changed. Developers won’t use PhysX hardware API but a specific version of it.


Each PhysX driver, in addition to having a new API has all previous ones. Or it’s supposed to. This isn’t always the case because a developer can have based his work on a certain branch of the API, have found bugs and asked for a new revision of this branch. For example Bet on Soldier uses the 2.4.1 version, but the driver which includes it doesn’t feature the 2.3.3 used by Ghost Recon: Advanced Warfighter. During installation, the driver doesn’t add additional elements, it replaces everything. In other words, it isn’t possible to run these two games with the same driver and you have to change from one to the other. This isn’t very convenient. Fortunately, Ageia released a 2.4.2 version a few days ago that includes the 2.3.3 and 2.4.1. You have to use this one and refuse installation of versions available in games. This system seems difficult to manage in the long term and it’s probable that Ageia could review the API architecture. Why not include the API dll directly in the game?


Ageia’s control panel tells you which version of the API is installed, has a small demonstration to verify that the PPU is functional, a diagnostic tool and access to updates.

Test configuration:
- A8N32 Deluxe
- Athlon FX 55
- 2x 1024 MB Corsair (CAS2)
- 7900 GTX
- Windows XP SP2


Page 4
Synthetic tests, demos

Synthetic tests
We had the PhysX card for a while and we tried to measure its interest and performance. With no games that support it and after noticing that different SDK don’t use it either (except for flowing movement), we were quickly disappointed. The latest versions integrate a couple of accelerated demos (very few actually) and they do not work without a PhysX card, which prevents any comparison. For each effect, in reality developers had to develop a standard version and an accelerated one as the capacity for its support isn’t obvious. If changing from one version to the other is sometimes easy, it isn’t always the case, because the same software and hardware function of the API can have different behaviour. Ageia still has some work to do on this level.

In the beginning, only a small demonstration integrated to drivers was accessible and we started with it to make a couple of synthetic tests. The first consisted in simply comparing the performances of physics processing made by the CPU and assisted by PhysX card.


With the PhysX card, processing is 23% faster (33% for the heaviest part). The CPU was an Athlon FX 55, far from a poor performer. We add that in a real situation it can’t spend 100% of its time for physics calculations. There is also scene management, IA and 3D.

Our following test takes this into account and consists in running the same test with a heavy 3D rendering (rthdribl). We measured the performance of physics processing and took into account those of 3D.

This time gaps are more significant. With assistance from the PPU, the system is 70% more efficient and 90% more so in the heaviest part! In addition, 3D performances remain at an excellent level. Without physics processing, the 3D card restricts performances to 160 FPS. With physics processing and the PhysX card, the score varies between 85 and 110 FPS. Without it, it plunges to 20 to 35 fps. The PPU really can make a difference.


Demos
We ran several demonstrations with the PPU. The first, Switchball, is a game in which you have to move a ball in a circuit full of traps. There are physics laws and you have to take in account inertia depending on the type of ball.


Some parts of the course require a PhysX card, like with this water gun:


It’s hard to be convinced of the necessity of a PhysX card for those small details, which any CPU can handle without problems (all the more so that the game is restricted to 25 fps).

The second demonstration, Hangar of Doom 1.2, is based on Unreal’s Engine (not the new version) and represents a scene in which a large number of objects are placed. It is impressive to see so many objects in motion in explosions or with contact. However, it’s hard to tell if the PPU really adds improvements to the CPU since the demonstration can only be launched when a PhysX card is detected.




Page 5
Games

Now let’s get down to serious business, the games.


Bet on Soldier
Developed by Kylotonn, Bet on Soldier supports the PhysX card thanks to a 1.3 patch that will be available very soon. The PhysX card is used to calculate particles in explosions and when a flamethrower is used. Here are a few screenshots with and without PhysX :


Without PhysX


With PhysX

Videos :
Without PhysX
With PhysX


Even if the graphic aspect is open to interpretation, explosions gain in intensity and a high number of particles are generated. Their life is short, however, as they quickly disappear. The additional particles only increase visual effects and can’t touch the player or another character, thus restricting their interest.

These effects have only been implemented in the PhysX version and it isn’t possible to activate them with a CPU, even if theoretically this should be possible. It is hard to compare performances at equivalent rendering quality so we compared them with and without PPU. Keep in mind that for this test with this resolution, we are CPU limited and not graphic card limited even with the additional effects.


The explosion begins at the 5th second. At this time, performances collapse with the physic card and this is the reason why we show results in this form. If we would have taken an average, the drop wouldn’t have been as significant. Game comfort significantly drops when the card is in use and blowing up ten objects at a time is at 10 fps.

Why does this happen? We don’t know and can only suppose that use of the PhysX card leads to a CPU overload. Or that data can’t be directly interpreted by the PPU and has to be converted by the CPU first before being sent. Another possibility is that PPU processing latency isn’t masked and the CPU spends time waiting for the PPU to send processed data. Either way, from what we saw and the performance loss, it’s hard to imagine that the CPU isn’t entirely the cause.
Ghost Recon : Advanced Warfighter
We were eagerly waiting for GRAW because it was announced to be the first game to really use the PhysX processor. We were quickly disappointed because only some explosions are accelerated.


Without PhysX


With PhysX

Videos :
Without PhysX
With PhysX


This time, graphics are an undeniable success. Explosions gain in intensity but just like with Bet on Soldier, additional particles can’t affect a player and are only there to increase visual aspects. They are particles and not three-dimensional objects and it doesn’t do much for the reality of a game. Pieces of metal fly in your face without any trouble while a bullet can kill you instantly.

Just like Bet on Soldier, these additional particle effects only work with the PhysX card. However, we decided after all to measure performances in a CPU-limited situation:


The second 4 correspond to the beginning of the explosion. Results are partly similar to those previously observed. There is a big performance drop as soon as the PP starts working. This time, however, performances remain poor for a longer period of time. Here again we aren’t really convinced of the interest of the PhysX card…


Page 6
Conclusion

Conclusion
When a new technology comes along we are often quite enthusiastic. This time, however, we have to be honest, we were rather disappointed. When the PhysX processor was introduced by Ageia several months ago, we saw it as an overall physics accelerator, but this isn’t the case. Starting a game using the PhysX engine isn’t enough to benefit from the PhysX processor (would Ageia play on this confusion?). To exploit the PhysX processor, developers have to specifically address each effect they want to accelerate.


The support of a PhysX processor was added to Bet on Soldier and Ghost Recon Advanced Warfighter in the last phase of their development. They chose to implement a particle effect to increase the realism of explosions. If this is the case, gameplay is increased even if it was Ageia’s first selling point for their solution compared to the GPU physics graphic acceleration that only concerns visual effects. Ageia still has to demonstrate the superiority of their solution compared to a CPU/GPU combination. How is this possible? Do developers want and can they develop different gameplay for some users? On top of its additional cost and because the whole chain of development has to take it into account from the start, would it be acceptable to favour some players in terms of gameplay? The problem exists of course for multiplayer games.

And this isn’t all, because Ageia still has to demonstrate the card and system performances once the PhysX card is in use. If our synthetic tests showed that it really does have some potential, there is no guarantee at this time that it could be transmitted concretely to games. For now, a simple use for surface effects literally kills performances. What would happen with a more in depth use?

The only beginning of answer that we can give is that Ageia probably is aware of this fact and that the PhysX really can add something but that it has to be taken in account at the beginning of the development of games. In the end we can only advise you not to rush out and buy the PhysX card. The ball is in Ageia’s court and they still have to convince us.


There is one last question. Is it really beneficial for the user to have a generalisation of physics accelerators? Wouldn’t it be better to only focus on the CPU and GPU that already represent a considerable budget even if it would mean the delay of physics explosion in games? The evolution of multi core CPUs brings more and more calculation power and we can logically think that it would be more interesting to attribute one or more cores to physics processing. For Ageia and its partners, it’s the occasion to create a new market but here again our current opinion doesn’t go in that direction. However, only one game with sufficient improvements would be enough to make us change our mind. UT 2007 ? Cell Factor ? the first videos of the latter are interesting now the only thing left is to see them in practice…


Copyright © 1997-2009 BeHardware. All rights reserved.