Lucidlogix Virtu MVP in action - BeHardware
Written by Guillaume Louel
Published on March 29, 2012
Launched amid great fanfare with the Sandy Bridge platform, Virtu is a software offering developed to deal with the issues resulting from the use of two different brand GPUs within the same machine. Opportunities for such software emerged with Sandy Bridge of course, as Intel added a GPU to all the processors in its range.
Now, while using two GPUs in Windows is nothing new and isn’t an issue – Nvidia’s SLI and AMD’s Crossfire have been managing the situation for years – using two different brand GPUs is a challenge. On XP, loading two different graphics drivers was easy but the option disappeared with version 1.0 of WDDM (the new graphics driver model introduced with Windows Vista). While the option was reintroduced in Windows 7 with WDDM 1.1 (and in Vista via a Service Pack), there are still limitations to the solution, with each GPU managing its own screens. When you want to use, say, the GPU integrated in the processor for the Windows desktop and an additional graphics card for games, you then either have to unplug your screen and restart your system or use a software solution such as Nvidia's Optimus (or the future Enduro technology from AMD), which is used on laptop platforms. Lucidlogix’s Virtu is also a software solution of this type, designed first and foremost for desktop platforms.
Before going into what distinguishes MVP – version 2 of Virtu – from the previous version, let’s look at the common features.
Mode D, Mode I
Virtu was mainly designed to allow you to use a GPU when the screen is connected to another. From a technical point of view, Virtu introduces a virtual layer between Windows and the WDDM drivers. We’re going to use the example of a machine with an Intel processor and a GeForce graphics card for the purposes of our explanation, in which case the drivers for each card, the Intel IGP and the GeForce, must be loaded. This layer allows you to make the system think, virtually transparently, that it’s running on an alternative to the main GPU. A game’s 3D rendering can thus be launched on the GeForce even when the screen is connected to the IGP via the motherboard. This only solves half of the problem however as the GeForce framebuffer now needs to be taken over to the IGP, which is an operation carried out by the driver which copies the frames processed in the GeForce framebuffer to the IGP framebuffer using the PCI Express interface. This second stage isn’t necessarily required with applications that don’t use 3D rendering, such as, say, a video encoding application that uses CUDA or Quick Sync.
The main Virtu MVP interface
Virtu, then, offers two distinct modes corresponding to how you want to use your machine. In “I” mode, the screen is connected to the motherboard. The IGP is then the main GPU in Windows and the rendering for 3D games is handed to the GeForce. This scenario – similar to what we see on PC laptops – was the first to be made available in Virtu. In theory it offers the advantage of limiting energy consumption on the desktop at the same time as permitting maximum 3D performance.
The other mode, “D” mode, works the other way round. Staying with our example, the screen is this time connected to the GeForce and the IGP is virtualised, say for using the Intel Quick Sync video compression unit. D mode was introduced mainly because the implementation of I mode isn’t completely transparent. In effect, the abstraction afforded by the Lucid application has an impact on both performance and latency (incompressible and due to the transfer of the framebuffer via the PCI Express).
We used the following configuration to check out how these different modes do:
- Asrock Z77 Extreme6 motherboard
- Intel Core i7 2600K processor, HD 3000 V188.8.131.5218 driver
- Radeon HD 6870, Catalyst 12.2
- GeForce GTX 480, GeForce 296.10 drivers
- Lucid Virtu MVP 184.108.40.20641
Virtu is announced as being compatible with all WHQL manufacturer drivers. The version we used – indicated as being the production version – isn’t compatible with the Radeon HD 7000s and the GeForce 680s. It will be interesting to see if future HD 7000 compatibility includes Zero Core Power in I mode as this would in theory allow the graphics card to be turned off on the Windows desktop or during video playback, something that would be very useful indeed! Finally note that Virtu is not compatible with multi GPU solutions from AMD and Nvidia.
Performance, energy consumption
Impact on 3D performance
We tried to measure the impact of these different modes precisely by comparing performance in different cases:
- Graphics card alone
- Virtu on, screen plugged into the graphics card (D mode)
- Virtu on, screen plugged into the motherboard (I mode)
- IGP alone
Let’s start first of all with performance in 3D games, where we compared 3D performance using Nvidia and AMD cards via the three available connection modes: card alone, D mode and I mode. The tests were carried out at 1920x1200. We chose five games for this test, four of which have profiles that are preconfigured in the Virtu interface. However this isn't the case for Batman Arkham City and for this game we created a profile manually via the interface. There were no particular problems with this title.
[ Radeon HD 6870 ] [ GeForce GTX 480 ]
The first thing to say is that the disastrous performance levels with the Radeon HD 6870 in I mode are linked to a clocking bug. Here, and in spite of several reinstallation attempts, the Radeon stayed blocked on its 2D clock of 100 MHz, which had a serious impact on performance even though the rendering was carried out on the graphics card.
GPU-Z confirms the 2D clock.
Another detail to note with respect to the Radeons in I mode is that the CCC control panel is no longer accessible, a known bug since the first versions of Virtu and still not corrected. The Nvidia control panel is however accessible in I mode, even though it can take several minutes to launch.
The Radeon driver can take several minutes before it displays this error message in I mode.
The second point is that Virtu is truly transparent in D mode for games, with no performance difference in this mode – the Virtu driver isn’t launched – which is a real advantage.
Finally the impact in I mode is relatively variable for the GeForce and depends on the game you're running, with a variation of between 5 and 17% according to the game. Note that Civilisation V wouldn't launch.
Impact on encoding performance
The Virtu logo appeared on the virtualised applications. Though you can turn this feature off,
it’s very practical as it allows you to check if the driver is functioning correctly.
We also wanted to measure the impact of Virtu during video encoding and, here, compared Virtu in D and I modes and with the IGP alone. We used Cyberlink’s MediaEspresso, which has a profile, and MediaCoder (an application we evaluated in our report on H.264 encoding), which doesn't and for which we had to create one:
You choose the executable and indicate which mode you want to launch it in. We used Quick Sync here (the IGP acceleration) in both applications.
[ Radeon HD 6870 ] [ GeForce GTX 480 ]
Here again, though Virtu is transparent in I mode when you want to use the IGP, we noted a very slight dip in performance on both applications in D mode. This was however relatively insignificant in comparison to that on 3D applications the other way round.
We also looked at energy consumption in the different scenarios. We took readings at idle, in Cyberlink Media Espresso, and in F1 2011.
[ Radeon HD 6870 ] [ GeForce GTX 480 ]
Firstly, using D mode doesn't represent much of a difference (one or two Watts) in comparison to the mode where the IGP is turned off. It is above all in I mode that we could have hoped to see a reduction in energy consumption at idle on the desktop. Unfortunately this wasn’t the case. Note that while energy consumption is lower in F1 2011 in I mode with the GeForce, performance also suffers. The graphics card isn’t working as hard and therefore consumes less power. The same goes for the Radeon HD 6870 readings, which, remember, was running at 100 MHz.
Hyperformance, Virtual V-Sync
For the MVP version, Lucid has introduced two new features which we had caught glimpses of previously, Hyperformance and Virtual V-Sync. Let’s start with the first. For all the following tests we used the GeForce GTX 480, with the screen connected to the motherboard (I mode) - you have to use this mode to benefit from HyperFormance and Virtual V-Sync.
On its website, Lucid says that HyperFormance "eliminates redundant rendering tasks and predicts potential synchronisation issues in the graphics delivery pipeline and intelligently removes and/or replaces them for better game control". In the documentation (that you can download in PDF format from ASrock’s site) or in the application interface, the explanation is more pithy: "improves overall game
performance and frame rate". Although a "white paper" is available (PDF), it mainly focuses on Virtual V-Sync. When it comes to the precise implementation details, the document explains that they won’t be given “for patent reasons”.
From a practical point of view, not all applications are supported, with the list mainly being comprised of benchmarks or games with built-in benchmarks. For info, trying to enable HyperFormance on Batman Arkham City caused flickering and lighting errors.
Once it has been turned on, HyperFormance does increase the framerate in the applications supported but also increases the 3D Mark scores (we dusted off the 2006 version for this test). Here are some numbers:
We’ll come back to the asterisk on Mass Effect a little further on. How can the miraculous increases on the other titles be explained however? Did the graphics card really create more images? Yes and no. We used Fraps to measure the time to process frames in Lost Planet 2, with and without HyperFormance.
Note, this graph shows the successive processing times for a series of frames. The horizontal axis doesn't represent time and we can’t therefore compare the frame times one by one. This said however, there are several trends. First in normal mode, the rendering time for a frame is relatively constant, varying between 10 and 16 milliseconds. In HyperFormance mode, there are two distinct cases:
- frames rendered in around 5 ms (A)
- frames rendered in between 10 and 16 ms (B)
Apparently therefore, though the number of frames does increase, not all the frames processed are the same.
Using AMD’s developer tool, GPU Perf Studio, we were able to make out, for at least one application, the type of optimisation used by Virtu MVP. Unfortunately GPU Perf Studio is rather fussy and, for example, only runs with DX10 or 11 applications, which excludes most of the titles accelerated by HyperFormance. We did however manage to show what’s happening in 3D Mark Vantage. We can’t confirm that the same type of optimisation is systematically used in all HyperFormance titles, though this does seem likely.
To simplify the task for our poor Radeon still blocked at 100 MHz (although overclocking tools such as MSI Afterburner detect 3D clocks correctly once a Virtu profile has been created and allow you to change them, they don’t actually implement these clocks), we confined ourselves to Entry mode at 640 x 480. Once again here are the processing times per frame, for a series of 25 frames with and without HyperFormance:
Once again we can see that although in normal mode all the frames look alike, there are at least two distinct types of frame in HyperFormance mode. Before going any further, remember that the rendering of a frame in DirectX is subdivided into stages, each of which is what is called a draw call, sent to the graphics card by the CPU to carry out a task. For more details on this we refer you to this article in which we broke the graphics rendering into stages of 3D Mark version 11.
Here, GPU Perf Studio allowed us to see that in Entry mode in Vantage (test graph 1), the frames at the beginning of the scene all require a little over 200 draw calls in normal mode. With HyperFormance, the “B” frames are of the same type, standard frames. But what about when these frames are calculated magically in under 5 milliseconds?
By pausing on one of these frames, we can see via the Frame Debugger that only four draw calls are made. In contrast to what we say in our article, and what is normal practice (applied to the B frames), here the memory buffers aren’t cleared. The frame that was previously calculated is simply reused.
The four draw calls are therefore as follows:
- Placement of the band at the bottom of the screen
- Writing of number of FPS
- Writing of time
- Writing of frame number
Obviously this practice is simply cheating. Virtu MVP is intercepting the draw calls sent by the CPU for these type A frames and deleting all the “useful” draw calls, only keeping these four calls for the band. Thus the frame previously processed is indeed repeated, except for the counter at the bottom of the screen. In this case these interspersed frames simply increase the framerate artificially and if we count the number of frames truly processed per second, HyperFormance actually produces fewer.
This trick must be adapted for each game or bench and we imagine that either almost all the draw calls must be deleted or be limited to minimal refreshes in certain titles (all the calls for example except the drawing of the HUD or the sighting cross for an FPS). This also explains why HyperFormance is limited to so few titles: these so-called ‘optimisations’ have to be written on a case by case basis for games.
Note that this process doesn’t always hold and that even though Mass Effect is listed as being supported, there was constant flickering when we tested it. We managed to capture these two frames, which alternate constantly to cause this flickering:
Partially rendered frames appear on screen, indicating a synching issue in terms of the choice of frames to be displayed
In theory, using two framebuffers (the graphics card framebuffer and the IGP), the Virtu MVP driver could select the frames to transfer to the screen and therefore prevent partially rendered frames from appearing on screen.
In Mass Effect apparently either the draw call sorting mechanism doesn’t work properly or there’s an issue with the mechanism for choosing which frames are displayed. In any case, the game is unplayable.
With this cheat HyperFormance thus manipulates the idea of what a frame really is to show higher scores in benches. How can such practice be justified? In its “white paper”, Lucid explains that increasing the framerate also increases the responsiveness of the application in terms of mouse/keyboard inputs:
Here we’re interested in point A on the left where the CPU is indeed described as piloting the GPU and sending the rendering tasks to it (via the draw calls mentioned above). However Lucid seems to us to be cutting corners by saying that increasing the framerate increases the responsiveness of IOs. We don’t think this is quite the right claim to be making for many reasons.
Firstly, you have to remember that the responsiveness of peripherals like the keyboard and the mouse is limited by the display to which they’re connected. With USB connections, communication with the mouse and keyboard is limited to 125 times a second by default (poll rate) by Windows. Going from 140 to 300 frames per second doesn't therefore increase the responsiveness of IO reads when they are themselves limted to 125.
Next, as gamers who have changed the USB poll rate in Windows know (some gamer mouse drivers allow you to do this for example, though other utilities also exist), increasing sensitivity isn’t directly related to the number of frames per second displayed. As the Lucid diagram shows however, the CPU is what pilots the graphics card and its rendering speed is (in part) related to the fluidity of inputs.
Modern games engines are multithreaded and inputs such as those from the keyboard and the mouse (as well as the network) are put together to create a simulation model in the memory on the CPU side. This model is a representation of the gamer’s world in the form of coordinates (for position) and vectors (for direction/movements) and other diverse data (amunition left and so on).
When a frame is rendered, the CPU thread checks in with this model to retrieve this information before sending the draw calls to the graphics card which will then do its work. Therefore you might think that actually increasing the framerate from 80 to 120 FPS would potentially reduce the latency between consultation of the model (this can come just after where the previous frame has been processed) and completion of the processed frame from 12 to 8 milliseconds (maximum possible reduction).
However as the screen can generally only display 60 frames per second (the final full frame if VSync is on, or a mix of frames when the copy in the framebuffer isn’t complete when the screen requests its frame in Vsync - which is what causes tearing), in practice the gain is reduced.
But what actually happens with Virtu MVP? If we take the most favourable case and place an ‘empty’ frame between each real frame, you do get a small latency gain in this part of the process, but this isn't the whole story. Sometimes two standard frames are processed one after the next - this was particularly the case in Lost Planet 2 above. When this happens, what we have is a fluctuating latency gain that can either be transformed into a very slight advance or a very slight delay.
We are however only talking two or three milliseconds at best and you have to put this in the perspective of the full game and display pipeline where the total latency between pressing on a key and the frame becoming visible to the eye on the screen is close to 100 milliseconds overall. With I mode, the PCI Express transfer adds an additional latency in the order of a little over one millisecond.
When framerate is well under 60fps, which is the case in our 3DMark Vantage example, while latency can be somewhat reduced when the frame first shows up, you have to remember that this frame will actually be displayed twice... Average latency would therefore not be reduced while less frames would actually be displayed. As far as we can see, in this case HyperFormance has one main practical use, which is to increase benchmark scores artificially. The latency argument seems more of a justification for this cheat of benchmarks than anything else.
Let’s look at the diagram again to see what Lucid is proposing with Virtual V-Sync, focusing here on point B:
Like traditional V-Sync, the idea of Virtual V-Sync is to eliminate the effects of tearing, the incomplete frames in the framebuffer that are sometimes sent to the display. VSync ensures that only complete frames are sent to the screen by the process of double buffering (use of a frame processed in advance). Virtual V-Sync takes advantage of the fact that in I mode, there are indeed two buffers, the graphics card buffer and the IGP buffer.
As we saw with HyperFormance, the Lucid driver can choose which frames are sent to the framebuffer that is actually connected to the display (in the same way as with the mouse and the keyboard, the display has a fixed refresh rate and the framebuffer is sent to the screen to be displayed 60 times a second).
The IGP framebuffer is no longer constantly updated but rather just 60 times a second if the framerate of the game permits it. From here, there are two scenarios to consider:
- Where the rate drops below 60 frames processed per second, if a new frame hasn’t arrived in the given time, the old frame is displayed again. As you can see, in this case there’s no difference between standard VSync and Virtual VSync and the mitigation due to repetition of frames is still there, even though Fraps will for example indicate that 37 frames have been processed per second. Only the complete frames are processed and, given that they have an identical rendering time, 30 frames, doubled, are sent to the display.
- Where the main GPU can process more than 60 frames per second, Virtual V-Sync does result in a difference. Without Virtual V-Sync, the graphics card won’t attempt to process more frames than necessary, which has, among other things, the advantage of keeping energy consumption, heat and noise down. With Virtual V-Sync, the graphics card will continue to process additional frames which won’t in fact be displayed.
So what’s the point of it? It always comes back to the same thing according to Lucid: increasing the framerate increases the responsiveness of the application. The Virtu algorithm then chooses which of the frames processed in an interval of 16 ms (1 second divided by 60) to transfer to the IGP framebuffer. In effect, our analysis is the same as with HyperFormance. Although the increase in framerate can have an impact on part of the latency, enabling vertical synchronisation and, therefore, double (or triple) buffering, actually increases total latency once again. The theoretical gain is insignificant and in practice our games weren’t any more responsive than with standard VSync, though they were slightly more responsive with VSync off.
With this new version of Virtu, LucidLogix continues to offer original solutions that do partly respond to certain needs and certain limitations of modern PC platforms equipped with more than one GPU. Thus, although I mode (with the display connected to the motherboard) doesn't seem to us to be particularly worthwhile - you lose performance without reducing energy consumption at idle – D mode allows those who so desire to use Quick Sync for video encoding (in spite of its quality issues, see our report on the subjet) and do so without having to unplug their screen and reboot their machine. As such, Virtu gives a solution to a real problem… of Intel’s.
With respect to the new features introduced with MVP however, namely HyperFormance and Virtual V-Sync, we’re much more sceptical. While the marketing may make these two technologies seem attractive, when you look a bit closer, things get a bit cloudy. By reusing technologies developed for its graphics rendering manipulation Hydra drivers, HyperFormance simply increases performance virtually, without bringing any real gain in responsiveness, either in theory or in practice.
We think it’s a shame that rather than simply banning these driver manipulations, which as we have seen are no more no less than a case of cheating when it comes to 3D Mark, this press release from FutureMark offers a compromise by promising to deliver an update that will allow the use of HyperFormance to be detected so that the scores published on its site which use this technology can be separated from those that don’t. We can’t help thinking that FutureMark wouldn’t have reacted in the same way if AMD and Nvidia had made this type of modification to their drivers (for SLI and Crossfire for example). Here the marketing argument of responsiveness, for a passive bench, is of even less substance. Likewise we're struggling to see the point of Virtual V-Sync, apart from as a visual display of a higher framerate than you’re actually getting. Lucid indeed emphasized framerates when presenting its technology at last September’s IDF.
Note finally that we also experienced numerous compatibility issues. Beyond the fact that our Radeon was blocked at 2D clocks, there were fairly persistent issues with crashes at launch or at exit of various games, particularly in Batman Arkham City and Lost Planet 2. We often had to reboot our machine during tests. Once again, these aren’t new problems as we also encountered them when previously testing the Hydra solutions.
As far as we're concerned, without the introduction of support for the Radeon HD 7000s in I mode, allowing the use of Zero Core Power, Virtu MVP is purely of interest for D mode for video encoding using Quick Sync. This is an issue that Intel could have found another (free) solution to, rather than investing in LucicLogix for a solution that can then be sold to motherboard manufacturers and therefore increases end user costs.
Copyright © 1997-2015 BeHardware. All rights reserved.