H.264 encoding - CPU vs GPU: Nvidia CUDA, AMD Stream, Intel MediaSDK and x264 - BeHardware
>> Graphics cards

Written by Guillaume Louel

Published on April 28, 2011

URL: http://www.behardware.com/art/lire/828/


Page 1

Introduction



Over recent years, graphics card manufacturers have been highlighting the ability of their GPUs to handle more than just gaming graphics. Although there has been some success in the market for High Performance Computing (financial services sector and so on), the concept of GPGPU (General Purpose computing on GPU) is still struggling to find purchase with the general consumer. Standards like OpenCL and DirectCompute are working to make an impact but so far this is still fairly minor.

Video, naturally!
General consumer GPGPU usage is nevertheless quite common when it comes to video encoding solutions. On paper, the idea seems a very good one. GPU power can already be drawn on for video decoding (Blu-rays for example), via a dedicated circuit integrated into the GPU and it would therefore seem logical to use GPUs for encoding too.

After a fairly limited start with the first version of Badaboom in 2008, transcoding is back on the agenda with Windows 7, which includes the possibility of using your graphics card to transcode files automatically towards a peripheral (portable video player, smartphone, as long as it is compatible and detected by Windows 7). Many free or paid applications have set to getting to grips with file transcoding, whether these files be destined for mobile peripherals, tablets or games consoles.

A common point of all these applications is that they often use libraries designed by GPU manufacturers themselves. Thus NVIDIA supplies an H.264 encoder developed in CUDA (nvcuvenc), AMD has its own which uses Stream technology (AMD Media Codec package at the bottom of the page), and even Intel have put in their penny's worth, as we saw at the beginning of the year on the launch of the Core Sandy Bridge processors with its MediaSDK (which is itself partly based on the IPP library).


But what do all these solutions give in practice? Is all the software that uses them equivalent? What about encoding speed and quality? And how are these encoding solutions to be measured against what is the reference when it comes to H.264 encoding, the open source software x264 (that we regularly use in our CPU tests)? We’ll be getting to grips with all these burning questions in this report!

Before going further, we want to thank Nicolas et fils for the loan of hardware used in the tests, as well as Jason Garrett-Glaser (x264 developer) for replying to our questions on H.264. If you wish to go into more depth on this subject, here are the links to some of the resources used in researching this article:

- The H.264 recommendation
- The book The H.264 Advanced Video Compression Standard by Iain Richardson (the site for which makes available some interesting free content here)
- Jason Garrett-Glaser’s blog


Page 2
Container, codec, transcoding



Before really getting to grips with our subject and to help understand exactly what we’re comparing, let’s go back over a few points with respect to the subject of video.

Container, codecs
What is wrongly called a video file is above all else a container. The concept is simple, it interlaces within it the various contents of the file, namely the video track, the audio track and potentially the subtitling tracks.


Left: view of principle. Right: interlaced tracks.


Interlacing the content facilitates the simultaneous playback of different tracks without requiring too much movement within the file. Related data (the audio that corresponds to an image for example) thus remains in close proximity. This is particularly important for video discs (DVDs, Blu-rays) or when transmitting a digital flow (via digital broadcast for example).

There isn’t really any relationship between a container and the format of the tracks it contains. An AVI file can contain video tracks encoded in many formats (MPEG-1, Xvid and so on), with the same going for audio (MP3, AAC and so on). A utility like MediaInfo will tell you exactly which tracks make up a file, and which format they are encoded in. There are also some technical limitations, which can for example prevent a video format such as H.264 from being integrated (straightforwardly) in an AVI file.

Transcoding
Therefore, when we talk about transcoding solutions, we’re actually talking about changing one or several of the formats in the original file. The compression formats can be changed to give a smaller file, converting, say, a DVD (VOB container, MPEG-2 video, Dolby Digital audio) to a video file (AVI, XviD, mp3). Sometimes you might want to retain the same video format but reduce the size of the files, by, for example, converting a Blu-ray (.m2ts, H.264, DTS-HD) into a file that can be played on a tablet (.mp4, H.264, AAC) or games console. Even if you don’t change the video format (H.264 on each side), you might want to recompress the video, either to change its size (Blu-rays take up a lot of space), screen size (reduce to 1280 x 720 instead of 1920 x 1080) or specificity of the destination device (iPads, for example, can't read all H.264 compression profiles).

There are, then, a large number of stages to manage when it comes to transcoding a file, and not all these stages can be accelerated by the GPU. In the diagram below, from left to right, you can see the stages necessary for the transcoding of a video file:


Click to enlarge.


Of all these stages, only two can currently be accelerated by a GPU, decoding the original video track to a raw video format (using dedicated GPU circuits, the same that are used during accelerated playback of a DVD or a Blu-ray) and encoding (using either the GPU processing units [CUDA – NVIDIA / Stream – AMD] or a part of the GPU dedicated to this task [Intel method for Sandy Bridge/HD 3000].

Only these two stages can be accelerated then, but they are by far the most resource hungry. Here is, for example, a breakdown of an encode that we carried out for our tests (file extracted from a Blu-ray to an MKV 720p file):


This is a very high quality encode that we processed to create a source file (we’ll come back to this). Video encoding time is therefore significant. To conclude on the subject of transcoding, note two points it’s important to bear in mind for the rest of this article:
- Decoding and encoding of video are tasks that are carried out in parallel, frame by frame. In theory encoding takes longer than decoding, but this isn’t always so in certain cases with respect to GPU acceleration.
- Although some tools allow you to break down the time taken for each stage, not all the software that we tested does so. When we talk about transcoding times further on, this will therefore consist of the time taken for all stages (demux, video encoding, audio, remux) and not only video encoding time!

Let's now move on to the specificities of H.264.


Page 3
H.264 (1/2)

H.264
Sometimes known as AVC (often when talking about Blu-rays) or MPEG4 Part 10 (ISO terminology), H.264 (ITU terminology) is a video compression standard often considered to be the most technically advanced compression format. In ISO terminology it follows MPEG-1 (CD video), MPEG 2 (used on DVDs) and MPEG 4 Part 2 (XviD).

It's a standard with an open specification (downloadable here), the use of which is subject to a certain number of patents belonging to various big names in computing, electronics, telecommunications and various universities (Apple, Cisco, France Telecom, Fraunhofer, Microsoft, Mitsubishi, Panasonic, Philips, Sony and Toshiba to name but a few; the full list of patents is available here). All these companies have come together to create a pool of patents managed by MPEG LA. In the framework we’re looking at (personal use, non-commercial), no license is required.

H.264 is now widely used, whether this be for Blu-rays (where it coexists with a Microsoft format, VC-1, which also comes under the supervision of patents handled by MPEG LA) or because of the fact that H.264 decoding can be accelerated by most current general consumer devices (smartphones, tablets, games consoles and so on). It is also widely used on online video sites, something that we noted in our test of the AMD Brazos platform when the Flash software posed a problem in terms of video acceleration.

Video compression in practice
Without going into too much detail, we’re going to give you a general overview of how H.264 works, which will allow us to put into context any issues we have with the various encoders tested further on in the article.

A video is a stream of frames, compressed one after the next. H.264, like many formats before it, subdivides frames into blocks of pixels (squares of between 4x4 and 16x16 pixels, known as macroblocks). To achieve the final result, an encoder must determine the data required for the recreation of a frame (this is what’s known as prediction; there are two types) then compress this data as efficiently as possible (here again, there are two distinct types of compression).

Temporal prediction
Behind this term lies a very simple concept: a succession of frames in a video are very likely to be linked, to have a relationship to each other. Whether the sequence is made up of a static shot or moving characters, or a camera pan taking in a landscape, the successive frames will very often have numerous links between them. Starting from this principle, the encoder will attempt to identify blocks in the destination frame taken from one or several previous frames. Differences may be minuscule, down to a quarter of a pixel. This is what’s known as motion estimation and is a type of processing for which GPUs are likely to be very efficient. Henceforth we will refer to temporal prediction as inter prediction.


Avatar, Fox Pathé Europa


On this frame taken from a scene in Avatar (using Elecard StreamEye) where the camera moves left to right, while advancing, you can see that the elements of the frame which are at different depths (the creeper on the left, the ferns at the bottom, the lighter coloured ferns at the top and the creepers at the top) all move differently within the frame.

Spatial prediction
Again this term is more complicated than it sounds. Spatial prediction consists in compressing data from the current frame. Rather than looking for similarities between current and previous frames, spatial prediction looks for similarities within the current frame. This concept is equivalent to the compression of static frames (JPEG and so on) and we will refer to this as intra prediction in the rest of this article.

Of course, an encoder can use both types of prediction within the same frame. If we take the same frame as before (still using Elecard StreamEye):


The inter predicted blocks (temporal prediction, using a previous frame) are in blue/yellow and the intra predicted blocks (spatial prediction, within the current frame) in red/orange. Avatar, Fox Pathé Europa


Note that even when an encoder does its best to be as precise as possible in its predictions, they aren’t necessarily perfect. In every case of prediction where an attempt is made to find similarities between one block and another (whether within the same frame, intra prediction, or in another, inter prediction), the destination doesn't always exactly resemble the source. This isn’t a problem as the difference can simply be made up. By definition, it will be small as, after all, the blocks resemble each other.

Compress what and how?
Our predictions generate two types of quite distinct data. On the one hand precise data such as motion vectors used in predictions. If the encoder has put in the effort to obtain precision down to a quarter of a pixel, obviously you don’t want to compress this data and lose accuracy. For such essential data, H.264 uses lossless coding (entropy coding). There are two distinct types, CAVLC and CABAC. The difference between the two is that the second requires considerably more processing, whether to encode or decode. You may remember that some time back, the GeForce 8800s offered decoding of accelerated H.264 streams, though this was partial. There was no GPU support for CABAC decoding at the time (this was added to the following generations). CAVLC is less resource hungry but also less efficient (lower compression ratio).

For data which can be subjected to a (slight) loss, such as, for example, the residual image that corresponds to the difference between the source and destination blocks during a prediction (see above), a destructive compression, known as quantization, is applied.

The combination of this compressed data (lossless and lossy) makes up your H.264 video file.

Now that we’ve set down a general overview of H.264, we’re going to look in detail at a few of the points that came to the fore during our tests.


Page 4
H.264 (2/2)

B-frames
As we have seen, there are two possible types of prediction: inter prediction – the prediction of a frame from earlier frames – or intra prediction – prediction from the current frame. In practice, you’ll find different types of frame within a video stream:
- some frames result only from intra prediction. Such a frame can be decoded independently of any other. This is what’s known as a key frame. Key frames serve as reference points and are vital for features such as fast forward and rewind. In practice a key frame should be placed on average at least once every ten seconds. These frames, which result entirely from intra prediction are known as I-frames.
- Other frames result from a combination of intra prediction and blocks which refer to earlier frames (inter prediction). These are known as P-frames. The frame presented on the previous page is a P-frame.
-H.264 adds a third (optional) type of frame. Rather than referring only to earlier frames, a B-frame may also refer to a later frame! This does of course make encoding somewhat more complex, but being able to refer to both previous and forthcoming frames limits the number of intra predicted blocks (bigger) that have to be added. This therefore improves compression!

Here’s a B-frame which combines three types of blocks:


In blue, the inter predicted blocks (temporal prediction, referring to a previous frame), in orange/red, the intra predicted blocks (spatial prediction, within the current frame). In green, the blocks referring to later frames! Avatar, Fox Pathé Europa


The following diagram summarises the different frame possibilities:


Deblocking filter
One of the recurrent problems of formats that use block coding concerns the sharp edges that can appear between blocks. If the blocks are overly compressed using approximate similarities, the edges of the macroblocks can sometimes be made out. You have probably noticed this on overly compressed JPEG frames:


Examples of overly compressed blocks


H.264 has a deblocking filter that can be applied to each macroblock. The encoder defines whether and how this filter is to be applied by the decoder. This parameter can require significant processing and may be put to one side by some encoders with a view to gaining time, as we will see.

Adaptive quantization matrices
H.264 quantization (a lossy compression technique) can be adapted by applying different values to each block. These are known as independent quantization matrices, or simply adaptive quantization. Instead of applying a uniform compression across all the residual data (subtraction of the prediction from the macroblock), some blocks are given priority over others, which means more detail can be retained in complex parts of the frame.

Profiles
Some features such as those previously described, can be particularly resource hungry, whether in terms of encoding or decoding. As H.264 aims to be a universal format, its authors have defined multiple usage profiles which either allow or disallow the use of certain features of the format. A mobile phone, for example, can only support the most basic profile, while a Blu-ray will use the most advanced mode. We’re interested in the three modes used by the various encoders: Baseline, Main and High. Here’s a little summary of the advanced features they support:


In practice we’ll always go for the highest possible profile, according to the formats supported by the destination peripheral. A games console like the Playstation 3 supports all three profiles, whereas the iPad only officially supports the first two. Before starting the coding process, it's important to bear in mind the capacities of the peripheral you'll be playing your file on: when you’re decoding H.264 on a specific machine, the format isn’t as universal as you might think!


Page 5
Measuring quality: PSRN, SSIM and the pitfalls


Measuring quality: PSRN, SSIM and their drawbacks
Here at Hardware.fr / Behardware.com, we try and design our tests to be as objective as possible. It's fairly easy to determine performance when the result can be easily measured (processing time for a given task for example). When it comes to video, the question of objective judgement of video quality is unfortunately very… subjective. The true objective criteria is visual quality of the video as perceived by the human eye. This is something that unfortunately can’t be measured or quantified other than by human tests.

Over the years, several tools have been developed to try and compare the quality of one video to another. The basic concept remains the same: frames from the compressed video are compared one by one to the source and this gives us a series of values for each of the frames that make up the video.

The standard measure (or metric) used to compare two frames is PSNR which attempts to determine the level of corrupting noise in a compressed image in comparison to the source. Used above all to judge static frame compression formats, PSNR is considered to be a very statistical meaure depending largely on the compression format chosen or the particularities of the encoder. The other metric used is called SSIM and attempts to determine the structural similarity between frames, with the aim of being a bit more realistic than PSNR.

While in practice SSIM is a better indicator of visual quality, the problem remains relatively complex as human perception is difficult for an algorithm to measure. Our eyes are for example instinctively attracted to faces. This means that the human eye will prefer an image on which the face is sharp but which may be otherwise inaccurate (and which has a low PSNR), to an image with a more even quality across the areas but in which the face isn't as well defined (but ironically a higher PSNR!).

Added to this problematic is the fact that as with any metric for which the algorithm is known, it's very easy to optimise an encoder for one algorithm or another to the detriment of the overall video quality. The x264 encoder illustrates this issue quite well: as well as having options that allow you to optimise the encoder for a film or cartoon, you can also optimise it to get the highest possible PSNR and SSIM values! Here’s a little example of what this can give in a scene from the film Inception. We have calculated the average PSNR and SSIM values with four different x264 optimisations (none, Film, PSNR, SSIM). Note that the PSNR values are expressed in dBs (the higher the value the better the sigal to noise ratio), while the SSIM is a value between 0 and 1 which indicates correlation to the source image, with 1 showing perfect correlation.


Attempting to optimise an algorithm doesn’t always work depending on the scene. The scene chosen here is full of explosions and PSNR optimisation gives the lowest scores. The SSIM metric seems to be best, obtaining the two best scores for the SSIM and PSNR. Is this borne out when we look at the frames? Let’s see what we get on a static frame taken from each video:



[ No Tune ]  [ Film ]  [ PSNR ]  [ SSIM ]
Move the mouse over/click on the links to view the corresponding frame. Inception, Warner Bros


Let’s start with the most obvious. On the PSNR frame, not much is right. Entire parts of the face are blurry and the shading on the right is blocked. Is the SSIM version better than the Film version? No. The eyelashes are sharper on the Film version and there's more detail in the face. So why does it have a lower score when the SSIM version is slightly more blurred? This is because of a parameter which makes PSNR and SSIM comparisons even more complex, the psycho visual optimisations which try to retain a maximum of details in interesting areas. This optimisation can be made out in a static frame but becomes even clearer in a video series: the video quite simply retains more details and what look like artefacts on a static frame (for the PSNR and SSIM metrics) actually give improved quality though with lower metric scores.

For these reasons, while we will give the SSIM/PSNR scores for the various encoders, these scores cannot be considered to be of absolute value!


Page 6
Number of passes, dynamic GOP



In addition to the format specificities discussed above, encoders can also be configured with certain specific settings. Here are some explanations of the role of these options.

One or two passes?
When you want to compress a video, you're generally limited to a given file size that you don’t want to exceed, for example for storage reasons (fitting the video onto a DVD or a maximum of videos in a smartphone and so on).

The usual method is to specify the bitrate – an average amount of data per second of encoding – so as to guide the encoder. Transcoding software generally handles this issue back to front – you indicate the desired file size and the encoder calculates the bitrate. However the bitrate serves simply as a guide and limit not to be exceeded. The encoder will then have to decide for itself where it can economise and where it can go over its ‘budget’.

The problem of this approach is that an encoder doesn’t know how your video is constituted. Are all the scenes equally complex? Can the end of the video be compressed more than the beginning? It’s difficult to say without viewing it first. This is why when you want to obtain a good level of quality, you use two passes. This means that the encoder carries out two passes over the video and can better manage its bitrate.

We have compressed two videos with identical quality settings, using first one and then two passes to illustrate the problematic. Every 100 frames (around 4 seconds), we calculated the average bitrate per frame, as shown in the graph below.

Hold the mouse over the graph to compare changes in quality.


As you can see in green, with a single pass the encoder is conservative and doesn’t go either too high or too low at any point. In red, with two passes, the encoder chooses to budget higher at the beginning of the video and less at the end. If you hold the mouse over the graph, you can see why by looking at the visual quality as compared with the SSIM metric. The last minutes of the video can be more highly compressed without compromising what is a very high level of similarity to the source, higher than on the rest of the video. By using a second pass, you can see that the quality remains more constant throughout the video, which is exactly what we’re looking for: jumps in quality make viewing uncomfortable.
While the advantage of using two passes has been recognised for several years now, all the GPU encoders that we have tested here make do with a single pass. This is a shame as implementing a second pass would be a very simple way of improving quality significantly.
Dynamic GOP
As we have seen, encoders have three types of frame at their disposal (I-frame, B-frame, P-frame), which can be used as they see fit. A good encoder will place a key frame (I-frame) when, for example, there’s a change in scene. What about poor encoders? They use a fixed frame structure (known as GOP, Group of Pictures)! They simply place an I-frame every 29 frames. Here’s an illustration of what two encoders give in practice:

Encoder 1: IPPPPPPPPPPPPPPPPPPPPPPPPPPPPIPPPPPPPPP
Encoder 2: IPPPPPPPPBBBPBBBPBBBPBBBPBBBPBBBPPPPPPI


Using a static GOP is among the worst ideas and has disastrous consequences in terms of quality!

If you look at the graph of one of our quality metrics, you can already guess what visual defects are going to appear:


The first problem of this 29 frame packet structure is the jumps in quality. Each I-frame restores similarity but quality drops with each P-frame that follows! Look at what happens on frame 48 however. Here there’s a scene change. The dynamic encoder inserts a key frame to mark the change and give a good level of quality. However, the static encoder waits for its cycle to come round and tries to restore quality with each P-frame, gradually improving quality until things return to normal with the next I-frame!
The difference in sharpness is there for all to see:

Hold the mouse over the image to see the next I-frame.

Avatar, Fox Pathé Europa

Now that we have gone over the potential pitfalls, let’s move on to the tests.


Page 7
Test scenes, configuration

Test scenes
We used files from three Blu-rays:

Avatar has the advantage of combining filmed and computer generated scenes. Originally developed for 3D, there isn’t too much fast movement, which is something encoders find easier to handle. The scene lasts 9 minutes and 20 seconds.


Avatar, Fox Pathé Europa


The scene from Inception is almost epileptic. The camera is static but the scene full of explosions. This presents many interesting challenges from the point of view of the encoders. Another challenge here is that Inception is a Blu-ray encoded in VC1.


Inception, Warner Bros


For K-On!! we encoded the first episode of the disc. Animation uses big tinted coloured areas which pose very different problems than those of standard films.


K-On!!, Kyoto Animation, Pony Canyon


We used these raw files, without recompression for our 1080p encoding tests. For software other than x264, we tried to set quality to a maximum as far as possible – we’ll come back to this for each encoder as quality options often vary a great deal. A high bitrate was chosen for this compression: 10 Mb/s (4.5 GB/hour), which is a little higher than digital HD (around 7/8 Mb/s). A high bitrate facilitates the task in terms of compression. This is still quite some way off a Blu-ray as the bitrates for our test files were 24.2, 25.8 and 35.4 Mb/s respectively. This scenario corresponds to a re-encoding for a console.

We also wanted to test a second encoding scenario at the lower resolution of 720p. So as not to disadvantage any of the encoders, we reduced and recompressed our 1080p sources beforehand so as to obtain new 720p source files. We carried out this operation at x264 in slowest mode with a bitrate of 25 Mb/s. Each piece of software may, in effect, use very different resizing methods which would have prevented accurate comparison of frames from an identical source. By removing this resizing element (which in any case doesn’t take up much time in terms of the encoding process) and starting with a very high quality source, we reduced as far as possible any false differences that may exist.

Note that we deactivated any options which try to retouch colours to make the videos brighter (increase of saturation and so on), whether this be in the software or, where the source image is decoded by the GPU, in the GPU control panels. In addition to being useless, they would have disadvantaged the encoders in the SSIM and PSNR tests.

Test configuration
We used the following configuration:
  • Intel Core i7 2600K
  • MSI H67MA-E45 (1155)
  • 4 GB Crucial DDR3 1333 MHz
  • Windows 7 64 bits

In addition we used three AMD and NVIDIA graphics cards (equivalent in price terms) to see if the most powerful models gave any advantage in terms of quality, or a reduction in encoding time:
  • Radeon HD 5750
  • Radeon HD 6850
  • Radeon HD 6970
  • GeForce GTS 450
  • GeForce GTX 460
  • GeForce GTX 570

For the Intel QuickSync solution, we tested the HD 3000 included in the 2600K.

A final word on the presentation of the results. So as to make the results legible, the graphs on the following pages, as well as the comparisons of frames, require the use of an HTML5 compatible Internet browser. Most of you will already be using one, but if not, you can choose to install any of the following browsers:
Page 8
ArcSoft Media Converter, Avatar 720p (1/4)


ArcSoft Media Converter, Avatar 720p (1/4)
Developed by ArcSoft (authors of the Blu-ray Total Media Theater playback software), Media Converter 7 (version 7.1.15.55, abbreviated as AMC in our graphs) proposes to transcode sources either in the form of video files or DVDs. The software has profiles for consoles (Xbox, PS3, Wii, PSP), tablets (iPad), smartphones and other peripherals designed for TV playback (Apple TV, WD TV, etc).


We created two profiles manually for the 720p and 1080p resolutions, maximizing quality options. The H.264 high profile is not supported and Arcsoft seems to have custom encoders for the CUDA and Stream versions. Quality is almost constant whichever GeForce is used, with a very slight advantage with the GeForce GTX 460. In practice, this advantage makes no difference. Note also that the GeForce GTX 570 (too recent?) wasn't detected by our version of Media Converter. Quality was strictly identical across the various Radeons.

Let’s start with the results at 720p for Avatar. To recap, our files encoded at 720p were carried out with a requested video throughput of 4 Mbit/s.

Avatar 720p


The first surprise was that using a Radeon drastically changes the encoding options. It uses the Baseline profile and there are no B-frames or CABAC. You do however get virtually dynamic GOP here in as much as it uses an I-frame on scene changes. The others are more radical: one I-frame ever 80 frames, then between them a repeated group of 1 P-frame followed by 2 B-frames. If you compare the scores of the Arcsoft encoders between themselves, the CPU version is quite some way ahead for SSIM. The Intel MediaSDK seems to be optimised for PSNR (and therefore optimised for benchmarks rather than visual quality). The CUDA version gives an extremely low score, which requires some explanation.


In terms of processing time, the pure CPU version is devilishly fast, while the Radeon version, in spite of the baseline profile, is twice as slow. The Radeon HD 5750 is faster than the bigger Radeon models, though without any obvious reason.

The average SSIM values (or PSNR) do not however mean a great deal. As with any relationship, it’s how the difficult times are managed that provides us with the parameters for judging encoder quality. To highlight these difficult situations more clearly, we isolated two series of 500 distinct frames, taken from our extract.

Use an HTML 5 compatible browser to see the graph!
Click here to see the PSNR graph of this scene.


Note that for this graph and for those that follow, we had to go for a dynamic scale in accordance with the content. So as to help you to visualise the differences between the various graphs, an orange line shows an SSIM of 0.9. You can also increase the size of the graph by clicking on the icon at top right.

Our first extract of 500 frames (scene 1) can be broken down into three parts. From around frame 0 to around 157, we have a relatively static scene. The main character is filmed in the pod and only the head moves. This is very easy to compress and all the encoders do a good job here. The scene ends with a fade to black (which gives us a score of 1 as all encoders know how to encode an entirely black image!). Up to frame 362 a new scene is introduced, more complex than the previous one as the camera moves in a slow tracking shot from left to right while advancing slowly. At frame 363, a new scene appears, again a relatively slow moving scene.

The Radeons use the least advanced encoding technique (baseline profile), but use dynamic GOP. The result of this is that while the quality of the encoding is systematically down on the others, the use of dynamic GOP means the Radeon gives better quality than the HD 3000 when there’s a change in scene. The GeForces are however completely lost when there’s a change in scene and there's a notable loss in quality. Much more serious than this however, when the image changes a little too much (and remember this is a slow-moving scene…), the Arcsoft Cuda encoder loses its footing between two I-frames (the peaks you can see every 80 frames) with a significant loss in quality. These jumps in quality are very visible. So as to show you what this gives in practice, we have isolated an identical frame from each encoded sequence, around half a second after the change in scene. The frames used for our comparison below are stored in PNG format so as to prevent any deformation. They may take a few seconds to load depending on the speed of your connection:

Click here to display the frame comparison in a new tab.


While, as we said, the GTX 460 CUDA encoding has a very slight advantage when it comes to conservation of textures over the GTX 450 version, in practice neither are useable because of the excessive artefacts. The Arcsoft CUDA encoder is quite simply not up to the task. Our other encoders show differences in sharpness. The HD 3000 encoder lags behind from a visual point of view, while the AMD versions and CPU encoder are comparable, with the AMD encoder even seeming slightly better when you look closely at the texture of the skin. In practice these two frames are both a long way off the source frame unfortunately, with a significant loss of detail.
Our second scene includes several quite rapid movements and uses motion blur heavily, which facilitates the task of the encoders. What does this give in practice?

Use an HTML 5 compatible browser to view the graph!
Click here to see the PSNR graph for this scene.


The Arcsoft encoders function well with motion blur. The CUDA version is too poor to speak of, but our three other encoders are neck and neck. The minute SSIM nuances are however visible. Look at the second frame that we have extracted from a motion blur sequence:

Click here to display the frame comparison in a new tab.


If we only take faces into account, the AMD encoder does pretty well and conserves a maximum of sharpness on the faces. There are however artefacts in the shaded areas between the dark and light areas on the faces on the CPU and HD 3000 encoder versions (left part of Jake’s forehead, on the right of the screen). Let’s see if this level of quality is confirmed in the slightly more complex scenes!


Page 9
ArcSoft Media Converter, Inception/K-On!! 720p (2/4)


ArcSoft Media Converter, Inception 720p
Now let’s move on to the film Inception, still at 720p, and the short extract we have chosen of just 40 seconds, which does however include some particularly interesting explosions.


When you look at the SSIMs obtained across the whole length of the scene, the Arcsoft CPU encoder seems to stand out a bit more from the Radeon/HD 3000 versions, which are at a similar level. Given the limited length of the extract, we won’t give encoding times here.

We extracted a scene of 500 frames. During the first 400, shots with explosions alternate with shots of characters (51, 192, 352) before the scene finishes with a much calmer sequence.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The advantage of the Arcsoft CUDA encoder is that it allows you to detect scene changes easily: at each trough in quality, there’s a scene change! The Radeon encoder is a good way down on the Intel encoder in the explosion scenes, while the Intel is behind in the static scenes. Let’s check out what this translates to visually, firstly in an explosion frame:

Click here to display the frame comparison in a new tab.


We're going to put discussion of the quality of the CUDA version to one side and concentrate on the Radeon. While in the slow-moving Avatar scenes, the AMD encoder held its own better, here it's a long way behind. This isn’t really a surprise as the baseline profile is a lot less efficient than the others in terms of compression! Although the CPU version of Arcsoft preserves more detail, the result is far from great.

What about the end of the sequence where the scene is more static?

Click here to display the frame comparison in anew tab.


None of the encoders maintain the grain in the background. The Intel encoder is the least precise on the face and the cheek and the contour of the eye are particularly blurred. This is difficult to forgive when you think that your eye is drawn to this part of the image first.

Now for our last 720p test sequence.


K-On!! 720p
We encoded an entire episode (24 minutes 11 seconds) of this animation, with a relatively static image. At 4 Mbit/sec, this shouldn’t be too much of a challenge for our encoders.


The first surprise is that the Arcsoft Intel encoder didn't work here (the software crashed). The second surprise was that the score with the Arcsoft CUDA encoder, while still behind the Radeons, seemed to indicate that the result wouldn’t be the usual soup!


In terms of processing time, the CPU version was once again very fast, even if the Arcsoft CUDA encoders were faster.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The GeForce encoder still struggles with scene changes but the result still seems excellent. What does this give in practice?

Click here to display the frame comparison in a new tab.


This isn’t a mistake, the GeForce version is actually the one that conserves most detail! With the CPU and Radeon versions of the encoding, you lose all the grain. As this frame is static for several seconds, these two encoders have deliberately chosen to blur the scene, which is to be regretted! This is a tendency that we have already noticed and which also shows, once again, the pitfalls of relying on SSIM and PSNR type readings!

Click here to display the frame comparison in a new tab.


In a scene with fewer textures and more solid colours, the differences are minimal, with just a few slight artefacts. Nothing too concerning.

Let’s now see if moving up to 1080p and a higher bitrate changes things for the Arcsoft application.


Page 10
ArcSoft Media Converter, Avatar 1080p (3/4)


ArcSoft Media Converter, Avatar 1080p
Here we use the same scenes as before, the source files being the Blu-ray video files.


The SSIM average for the CPU version of the Arcsoft encoder is excessively low. This is because this encoder uses a variable frame rate for 1080p encoding. While our source uses a constant frame rate at 23.976 frames/second, the Arcsoft encoder varies the frame rate depending on the complexity of the scene. While it’s often question of a few milliseconds, here the differences are more significant (via MediaInfo):

Original frame rate: 23.976 fps
Minimum frame rate: 8.108 fps
Maximum frame rate: 29.132 fps

The result is that after around 1500 frames, the encoder applies a frame rate that is almost a quarter of the original frame rate on a scene it sees as being slow-moving and static. SSIM/PSNR comparisons become impossible!

The use of VFR is a valid technique as it's included in the H.264 spec. It is however generally reserved for very precise usage, such as streaming or surveillance cameras. In our eyes, it doesn’t make any sense to use it to encode films as the playback of VFR files can be problematic on some software/peripherals, even more so at 1080p where the file is destined for playback on a TV. The CUDA/Stream/Media SDK encoders are not confronted with this problem and use a constant frame rate.


In terms of encoding time, MediaConverter seems to stumble when it comes to the audio track, which is encoded in parallel to the video track. This takes an abnormally long time.

Nevertheless let’s look at the changes in the SSIM over the first 500 frames of the scene:

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


During the first 500 frames, the frame rate doesn’t change for the Arcsoft CPU encoder. The Intel HD 3000 encoder is hot on its heels. What does this give visually?

Click here to display the frame comparison in a new tab.


In spite of an average score above, the MediaSDK version isn’t very good, with a lack of sharpness in the faces and skin textures in comparison to the CPU and Radeon versions.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


In this second scene the CPU version is already 7 frames behind, falsifying any comparison and making the scores obtained on the CPU version useless. For the other encoders, the results with the MediaSDK and Stream versions seem very good, with a small advantage for the MediaSKD encoder. Let’s check on the frames:

Click here to display the frame comparison in a new tab.


Again, quite ironically, the Radeon version conserves more detail. You can see this clearly by looking at the section in the middle of the screen (completely blurred by the other encoders).


Page 11
ArcSoft Media Converter, Inception/K-On!! 1080p (4/4)


ArcSoft Media Converter, Inception 1080p

While the Arcsoft encoder still uses a variable frame rate, in practice it’s as if it didn’t (via MediaInfo):

Original frame rate: 23.976 fps
Minimum frame rate: 23.976 fps
Maximum frame rate: 23.976 fps

Let’s look at the results of the various encoders included in ArcSoft Media Converter:

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The CPU version seems to do best whether in the explosion scenes or the calmer ones. Is this borne out in practice?

Click here to display the frame comparison in a new tab.


Even if the result is far from perfect, the CPU encoder does best in terms of conservation of detail. The Radeon encoder is literally nowhere, the baseline profile having exceeded itself!

Click here to display the frame comparison in a new tab.


On more static frames, you can once again see the loss of sharpness on the face with the Intel MediaSDK encoder. This is particularly visible on the forehead. There’s a loss in grain in the background texture and increased artefacts. The Radeon encoding is of comparatively better quality, quite close to the quality with the CPU encoder.


K-On !! 1080p

Once again the Arcsoft CPU encoder uses a variable frame rate (via MediaInfo):

Frame rate mode Variable
Frame rate: 23.976 fps
Minimum frame rate: 7.992 fps
Maximum frame rate: 47.952 fps

The final 11 frames from the original video are missing which makes the scores for this encoder invalid. Let’s nevertheless look at the results on the scene we've isolated:
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The scores are relatively high, but as we saw in practice on the 720p version, this doesn’t mean very much. Will this result in a blurred image once again with our encoders?


In terms of encoding time, there’s an enormous difference from one card to another here, and this time the Radeon HD 6970 is fastest.

Click here to display the frame comparison in a new tab.


You can find better! Once again the Arcsoft CUDA encoder does well on this scene, in which it conserves the grain best. The Radeon Stream encoder renders an artistic blur while the CPU and MediaSDK versions are slightly better, without being perfect.

Click here to display the frame comparison in a new tab.


The reason for the weaker scores for the CUDA encoder becomes clearer when you look carefully at the lines. The line above our point of interest disappears slightly, which is what makes the difference. There are still some small artefacts on the solid colours, though we’d be happy with such a result in all of our tests!


Page 12
Cyberlink MediaEspresso, Avatar 720p (1/4)


Cyberlink MediaEspresso, Avatar 720p (1/4)
From the developers of PowerDVD, Cyberlink MediaEspresso (in version 6.5.1229, abbreviated to CME in our graphs) is also part of the video transcoding brotherhood. This application has quite a friendly interface. At the top of the screen, you’ll find the access to the presets for smartphones, media players, games consoles and Internet sites.


For our tests we created two profiles for 720p and 1080p, maximising all available quality settings. The good news is that the Cyberlink application allows you to choose to turn GPU acceleration on for any of the decoding or encoding phases, or both (see our explanation on page 2).

Avatar 720p


The good news stops pretty quickly however when you look at how the software handles encoding. Firstly, it only supports the baseline profile and as we saw with the Arcsoft software, this isn’t likely to improve quality. The baseline profile is highly disadvantaged at an equal bitrate in action scenes. Next, the bitrates are surprising. The Cyberlink CUDA and MediaSDK encoders use a constant bitrate, namely exactly 4 Mb/second all the time. Such a strategy doesn’t favour quality (remember why using two passes improves quality!). Next, the ‘Quick’ mode, available for encoding with the HD 3000 (abbreviated by the letter Q in our graphs below), reduces the bitrate automatically. Lastly, the GOP is static and, in performance terms, the CPU encoder is once again ahead of the rest! Note, the Radeons crashed systematically when we tried to encode our test scene with GPU acceleration. We reproduced this problem with the AMD AVIVO encoder that is used here.


In terms of encoding time, we note that graphic card decoding accelerates CPU encoders but slows GPU encoders down! The encode versions are faster than the Full versions…

Let’s check all this in practice in our first scene in Avatar.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


As you can see, the scale of our graph has been drastically reduced: The reason for this is the Intel encoders fall off very abnormally in the Full version. The reason for these troughs is fairly easy to understand:


It seems likely that it’s a result of a bug, especially as the artefacts don’t appear either in the purely decode version or the purely encode version with the HD3000! If you click on both these encodes in our graphs, the scale returns to normal. As we might have expected, if you only use the decode part of the GPU acceleration, all the encoders perform at an identical level overall, with however a very small variation between the GeForces and the rest. The three encoders (CPU, CUDA encode, MediaSDK encode) suffer from the lack of dynamic GOP when there’s a change in scene around frame 363. Let’s check this visually:

Click here to display the frame comparison in a new tab.


Everything is very blurred here, especially with the HD 3000 encoder when you look at the details of the skin, the vegetation or quite simply the faces. Ten frames after a change in scene, the lack of dynamic GOP and detection of new scenes creates unbearable jumps in quality. The CUDA version of the Cyberlink encoder doesn’t have the same soup-like aspect as the Arcsoft one, which is a bonus!

Now let’s move on to our second scene with all the motion blur.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


There are no unusual troughs here and the results seem homogenous, even if the CUDA encoder is a small notch down on the others. Let’s check this visually:

Click here to display the frame comparison in a new tab.


The results are quite poor in terms of sharpness (and therefore conservation of detail) for all cases. Note the artefacts above the grille, particularly on the GeForce/HD 3000 versions. Now let’s see how Cyberlink does in the other scenes.


Page 13
Cyberlink MediaEspresso, Inception/K-On!! 720p (2/4)

Cyberlink MediaEspresso, Inception 720p
Moving on to the film Inception, still at 720p from which we have selected a short extract of just 40 seconds but which includes some particularly interesting explosions.


It was a nice surprise to see that the AMD encoder didn’t crash here and we were able to judge how good it is. If we take the averages for the whole of the extract, the classification seems to be as follows: CPU encoder, HD 3000 encoder, Radeon encoder, GeForce encoder.

Let’s see how the encoders do with the multiple scene changes in our extract:

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The 'Full' encode peaks for the HD 3000 are no surprise as the scene is affected by the same bug as in Avatar:


Otherwise, it’s the explosions in the scene (it alternates between shots of characters and shots of massive explosions) that are making an impact between frames 250 and 350. What effect does this have visually?

Click here to display the frame comparison in a new tab.


Whatever the version, the results aren’t good. The Radeon encoder gives a particularly blurred result and all the textures are lost. This is quite noticeable on the rattan table on the left. The NVIDIA encoder adds black marks to the red menu on the far left but conserves the textures on the rattan table better. Let’s see if these encoders can make up any ground in a more static scene:

Click here to display the frame comparison in a new tab.


The CUDA encoder does a bit better on this frame with better conservation of details on the face in comparison to the others. In any case the grain on the right is lost whatever the encoder used.


K-On!! 720p
We finish with our animation, which theoretically should be the easiest scene for our encoders to deal with.


The Radeon encoder crashed again here.


Again the GPU decode slows down the GPU encode.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Let’s look at the problems on a case by case basis. First of all, the HD 3000 encoder continues its love affair with black squares:


Next, a variable frame rate is used in the CUDA encoding, which puts frame comparison out of synch and renders it useless (via MediaInfo):

Original frame rate: 23.976 fps
Minimum frame rate: 0.237 fps
Maximum frame rate: 23.981 fps

The Radeon encoder doesn’t work, but the decoder doesn’t really work either as it produces this type of artefact (ironically more pronounced than with the 6970!):


This is already pretty damning, but there’s more. Whatever the encoder used, the violent deterioration shows up on the fades to black:




[ Source ]  [ CPU ]  [ CUDA ]  [ Stream ]
Hold the mouse over/click on the links to display the corresponding frame.


Nevertheless, let’s take a look at the visuals…

Click here to display the frame comparison in a new tab


You can see the Radeon HD 6970 artefacts in this scene. Enough said.

Click here to display the frame comparison in a new tab.


Just look at the results obtained following the Radeon ‘decodes’. MediaEspresso does not do well with animation.


Page 14
Cyberlink MediaEspresso, Avatar 1080p (3/4)


Cyberlink MediaEspresso, Avatar 1080p
Will the Cyberlink software do any better at 1080p? To recap, we use a bitrate of 10 Mbit/s on the following extracts!

As at 720p, the GeForce and MediaSDK encoders have a constant bitrate. The Radeon encoder worked and the SSIM scores lead us to think that the frames are out of synch.

To be precise, the discrepancies are +3 frames in CPU mode, decode and HD 3000 encode and +1 frame in Radeon encode/full mode.


Note for once, GPU decoding doesn’t slow down the encoding. We are giving you the links to our graphs below, in the interests of thoroughness, but the fact that the frames are out of synch makes the results impossible to compare:

Click here to see the SSIM graph for this scene.

Click here to see the PSNR graph for this scene.


We compensated for the discrepancies manually to produce the visual comparison of the frames. Let’s see what this gives us!

Click here to display the frame comparison in a new tab.


The results are rather blurred all in all. The CPU encoder version gives the best sharpness.

If you wish to see the graphs of our motion blur scene in spite of the frame discrepancies, click here for the SSIM readings and here for the PSNR. Looking at the visuals:

Click here to display the frame comparison in a new tab.


The HD 3000 version struggles, but all the versions give a measured result. The pure CPU encoding still gives the best result.


Page 15
Cyberlink MediaEspresso, Inception/K-On!! 1080p(4/4)


Cyberlink MediaEspresso, Inception 1080p
The Blu-ray of Inception was encoded with VC1. MediaEspresso won’t open .m2ts files using this video codec.


K-On !! 1080p

Can the crushing domination of the GeForces in the encode for K-On !! be verified in practice?


GPU decoding slows the GPU encoding down once again…

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Once again, variable frame rates spoil any objective comparison. To illustrate what impact the fact that the frames are out of synch has, we haven't, for once, compensated for the differences so as to show you what our frame comparison software shows:

Click here to display the frame comparison in a new tab.

Here the differences due to frame discrepancy are minimal, with just a few flowers fluttering about differently.

The GeForce and CPU encoders are neck and neck in terms of conservation of details, with the Radeon version blurred and the black squares on the MediaSDK version being replaced by white squares!

Click here to display the frame comparison in a new tab.


Here the results are almost identical. Little does it matter however as the problems with the faded blacks that we noted at 720p are still there and the encodes are still unusable.


Summary
In only offering baseline profile H.264 encoding, MediaEspresso was already at a disadvantage in comparison to the competition software. The fact that it uses constant bitrates and variable frame rates doesn’t help matters either, as the opposite (constant frame rate and variable bitrate!) is advisable when doing anything other than streaming. The absence of dynamic GOP tops things off by introducing constant jumps in quality whenever there are scene changes, as we showed previously:

Hold the mouse over the image to view the next I-frame.

Cyberlink MediaEspresso CPU mode, Avatar, Fox Pathé Europa


Cyberlink could however easily correct most of these problems by configuring its encoders differently. Perhaps in a future version!


Page 16
MediaCoder, Avatar 720p (1/4)


MediaCoder, Avatar 720p (1/4)
On offer from an independent developer, MediaCoder is a sort of ultimate video transcoding toolkit. Its author includes a multitude of encoding software in one package, including the official NVIDIA (CUDA) and Intel (MediaSDK) encoders. In practice MediaCoder comes with OpenCandy (adware) and, each time you start it up, the software opens your browser on a web page filled with ads (by launching the software minimised to the system tray!). On top of all this MediaCoder is listed in the ffmpeg hall of shame for violations of the GPL license.


Not really the sort of software you’d recommend on paper except that it does seem to have a couple of advantages: multithreaded CPU decoding and the ability to configure the NVIDIA/Intel encoders quite precisely.

Avatar 720p


In theory MediaCoder allows you to turn on dynamic GOP for the GeForce encoder as well as giving you the choice between the three main H.264 profiles (baseline, main, high). Support for CABAC and B-frames is included but there's no dynamic GOP. The SSIM/PSNR scores are very low and for once, variable frame rate can’t be blamed.

The version of MediaCoder that we used also tends to take out the first few frames at the beginning of the scene. In the case of Avatar, six frames have been taken out at the beginning on the NVIDIA encodes and four on the Intel encodes.


When it comes to encoding time, MediaCoder is unbeatable thanks to the use of a multithreaded CPU decoder in parallel with GPU encoding.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


As the frames are out of synch, let’s move on to the visual comparisons:

Click here to display the frame comparison in a new tab.


The results aren’t necessarily that bad when you compare the higher encoding settings. Most of the textures disappear in both cases but the Intel encoder remains more blurred on faces than the rest of the encoders tested. As this tendency has now come up in three different applications, it looks as if it must be a fault of the Intel MediaSDK encoder.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The second graph isn’t of much interest because of the frame discrepancies, so we’ll move on to the images resulting from motion blur:

Click here to display the frame comparison in a new tab.


In this scene the Intel encoder retains a small advantage with respect to the character on the left.


Page 17
MediaCoder, Inception/K-On!! 720p (2/4)


MediaCoder, Inception 720p

Once again there’s a discrepancy of 6 frames on the GeForce version and 4 frames on the HD 3000 version.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


We’ll leave the graphs to one side and compare the images:

Click here to display the frame comparison in a new tab.


By comparing the baseline and high GeForce modes, you quickly see how Cyberlink would benefit from better configuration of its encoders! The Intel encoder is a bit more blurred and does better on complex textures such as the table on the left. The GeForce version tends to retain more detail but create artefacts.

Click here to display the frame comparison in a new tab.


In our static scene, the difference in sharpness between the baseline and high modes is high in both cases, proving once again that it is best to use the highest possible profile. The GeForce encoder is still a little sharper.


K-On!! 720p

The frame discrepancies are smaller here, just 1 or 2 frames, which is the reason for the higher scores.


MediaCoder is still the fastest of the encoders in our panel.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


It’s very easy to see each change in scene or each movement by looking at the peaks and troughs! Let's move on to the much more useful image comparisons:

Click here to display the frame comparison in a new tab.


The CUDA encoder does well with this scene, whatever the software used, and the result is sharp here as of baseline mode. High gives very good results indeed. The HD 3000 version is a notch down here.

Click here to display the frame comparison in a new tab.


The differences are very minimal here and slight artefacts continue to appear in solid colours. Note that the problems in the fade to blacks that we observed with MediaEspresso persist here… but only in the base profiles:




[ Source ]  [ CUDA base ]  [ CUDA high ]  [ MediaSDK base ]  [ MediaSDK high ]
Hold the mouse over/click on the links to display the corresponding image.


Page 18
MediaCoder, Avatar 1080p (3/4)


MediaCoder, Avatar 1080p
The 1080p encoding was carried out at a bitrate of 10 Mbit/s.

The good news is that, while using an MKV file as the source caused frame discrepancy at 720p, using an m2ts file as the source means the frames are not out of synch! Phew!


In confirmation of what we were saying earlier, using a CPU decoder in parallel with a GPU encoder makes MediaCoder extremely fast.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Intriguingly the quality of the Intel encoder on the tracking shot is slightly better in base mode than high. However the results remain very consistent throughout the extract. Let’s check out the quality in practice after a change in scene.

Click here to display the frame comparison in a new tab.


Without dynamic GOP, the loss in sharpness is very significant on the Intel encoder, a little less on the CUDA encoder which gives a very decent result in high mode.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


In our second scene the quality levels seem to be very comparable once again, with an advantage going to the Intel encoder. What about the visual image?

Click here to display the frame comparison in a new tab.


Intriguingly, a green square is at first very visible on the NVIDIA encoder results. Showing up most on the character on the left, squares also appear on the shaded areas of the middle/upper part of the pod. The Intel version is more blurred but doesn’t suffer from these issues.


Page 19
MediaCoder, Inception/K-On!! 1080p(4/4)


MediaCoder, Inception 1080p

MediaCoder redefines the notion of frame discrepancy here as 24 frames are missing from the beginning of the scene! A full second, no less…

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The second delay here makes the graphs useless. Let’s move on to the visual comparisons:

Click here to display the frame comparison in a new tab.


The advantage of the Intel encoder in terms of conservation of textures is particularly visible on the rattan table. On the other hand, the NVIDIA version creates colour nuances which don’t exist in the source image and which translate into visual artefacts when the frames succeed one after the next.

Click here to display the frame comparison in a new tab.


If we forget the grain in the scene on the right, the NVIDIA encoder is once again sharper, when it comes to the face, than the Intel encoder.


K-On !! 1080p
So then, frame discrepancy or not? Listen to the drum roll!

The answer is… no. This time our frames are correctly aligned.


GPU decoding is again slower here! Let’s look at the results:

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Quality is pretty homogenous throughout the scene, with just the baseline modes struggling a bit during the fade in/fade out.

Click here to display the frame comparison in a new tab.


Upwards of base mode, none of the encoders has any problem displaying a static image during several seconds.

Click here to display the frame comparison in a new tab.


There are very few problems on this second image.


Summary
Overall, the results obtained with MediaCoder were good and being able to use the high H.264 profile gives a boost to quality, particularly in the CUDA version of the encoder which was held back by poor implementation in the Arcsoft application and by the baseline profile in Cyberlink. In spite of this, while the results are good, we do need to put them into context: the results are good for GPU encoding. However, missing frames, too much advertising and the unfriendly interface mean this doesn’t come without a cost, even if the software is itself free.


Page 20
StaxRip/x264, Avatar 720p (1/4)


StaxRip/x264, Avatar 720p (1/4)
As we mentioned earlier, we wanted to compare H.264 encoder GPU-accelerated performance with what exists on the CPU side. While a great deal of commercial software exists, there’s also an Open Source implementation of H.264 known as x264. A project started by some of the VLC authors, x264 is an open source implementation of H.264, just as XviD was in its time for Mpeg 4 part 2. x264 is generally considered to be the best in terms of quality, to the detriment of encoding speed of course.


In itself x264 handles only the encoding of raw frames to the H.264 format. In order to carry out transcoding, you need to use a frontend which includes the various other necessary bits and pieces (decoders, audio encoders and so on). We used StaxRip, which is once again an open source application. While it is a little more complex to get used to than Arcsoft or Cyberlink, its interface isn’t too bad. It has the advantage of offering plenty of control in terms of x264 configuration, which is why we opted for it over other tools such as Handbrake which we recommend if you find StaxRip too hard to get a handle on.

We used version 1.1.7.0 of StaxRip, to which we added the latest build of x264 to date at the beginning of our tests, r1913.
In contrast to the other applications, x264 literally allows you to configure all the encoding parameters. This is very practical, though it can sometimes seem a bit intimidating:

cabac=1 / ref=16 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-3 / threads=12 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / constrained_intra=0 / bframes=8 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=250 / keyint_min=23 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=4029 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:1.00

In order to simplify the task a bit, the authors have also added a number of presets, which offer successively higher quality levels. For example, in the above encode, we’ve simply gone for ‘veryslow’ mode. There are 10 different modes, going from ‘ultrafast' to ‘placebo’, an awfully slow mode on which the gains in comparison to ‘veryslow’ are barely visible.

We tested nine of the modes (all except 'placebo’) both with two passes (the standard and recommended mode) and one pass to provide comparison with our other encoders. It’s time to show you our findings!

Avatar 720p


’Ultrafast’ mode in tune film implies a baseline H.264 profile.


As you can see with the processing times, the addition of a second pass doesn’t double encoding time. By definition, the first pass is rapid. We’ve pulled out the processing times for each pass for information:


The 2p faster mode was carried out in real time on our Core i7.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


As you can see, the 'ultrafast' preset stands out from the rest. This is hardly surprising as it’s the only preset to use the baseline profile. x264 is indeed configured to give the fastest possible result, to compete with the GPU encoders. In practice this mode should be avoided as the results are awful to say the least, as we’ll see. Note that all the other modes use a high profile (with CABAC, B-frames, dynamic GOP and so on).

If you compare one preset in single and double pass (2p) mode, you can straightaway see which is best in quality terms. How does all this translate visually?

Click here to display the frame comparison in a new tab.


While ‘ultrafast’ mode is truly bad for your eyes, quality rises quickly. In ‘faster’ mode, the quality is already higher than the best of what we’ve seen up to now. What is much more interesting however is that with two passes there’s a significant improvement in quality, with veryfast 2p already significantly better.

Now let’s see what we get with motion blur:

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Note that x264 encoding isn’t optimised for SSIM measurements, but rather for visual quality according to psychovisual optimisations. In spite of everything, as of 'veryfast' (1 pass), quality is up on everything we’ve seen up until now. What about the visual image?

Click here to display the frame comparison in a new tab.


As of ‘superfast’ mode, the visual quality is higher, especially when you look at the grille in the middle. Once again, the 2p modes give significantly more sharpness.


Page 21
StaxRip/x264, Inception/K-On!! 720p (2/4)


StaxRip/x264, Inception 720p

Let’s move on to our choice morsel, our explosion scenes!

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Once again, it is particularly interesting to compare the scenes with character’s faces with the explosion scenes. In single pass mode, you can see very clearly that the visual image is a little down in terms of quality. In double pass mode however, the encoder recognises the difficulty of the scene and budgets a higher bitrate for these scenes. Visually, the difference is notable:

Click here to display the frame comparison in a new tab.


Look how easily a mode such as ‘veryfast’ 2p outdoes ‘veryslow’ 1p, which is nevertheless 3.6x slower. As of ‘veryfast’ mode, double pass quality is almost perfect!

Click here to display the frame comparison in a new tab.


The results in single pass mode aren’t that great. Note that even ‘veryslow’ 2p doesn’t conserve the original grain of our scene on the right. The bitrate simply doesn’t allow this.


K-On !! 720p

For K-On!!, we went for the tune animation mode in x264.


The ‘faster' 2p mode is encoded in real time on our processor.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The scores are very high here, even in ‘ultrafast’ mode. The advantage given by 2p mode isn’t as clear here, at least not in this sequence. And how does it look?
Click here to display the frame comparison in a new tab.


As of 'superfast’ the quality is already very close to our source image! ‘Ultrafast’ is sharper than most GPU baseline encodes.

Click here to display the frame comparison in a new tab.


In ’ultrafast’ mode an artefact appears in the solid blue area isolated on screen. The other modes are perfect visually speaking.


Page 22
StaxRip/x264, Avatar 1080p (3/4)


StaxRip/x264, Avatar 1080p
Moving on to 1080p and a bitrate of 10 Mbit/s…


If you've got sharp eyes you'll have noticed that the ‘ultrafast’ and ‘superfast’ modes are faster here at 1080p than at 720p! Our 720p source file has an absurdly high bitrate (for the quantity of pixels) and the encoding was slowed down… by the decoding. The StaxRip decoder isn’t multithreaded.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Once again the superiority of the 2p modes is obvious on the graph, but what about the image?

Click here to display the frame comparison in a new tab.


You have to look closely to see the amount of detail and the grain in the skin. The superiority of the double pass modes, including ‘veryfast’ 2p is there for all to see.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


In the motion blur scene the results from one preset to another are very close. And how does it look?

Click here to display the frame comparison in a new tab.


As of ‘superfast’ it does particularly well with very good conservation of detail.


Page 23
StaxRip/x264, Inception/K-On!! 1080p(4/4)


StaxRip/x264, Inception 1080p

Now we move on to our most complex scene.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Once again the difference in uniformity in the quality of a single pass scene to a double pass is very clear. Is this as obvious visually?

Click here to display the frame comparison in a new tab.


The ‘veryfast’ 2p mode already offers excellent quality, a good deal above any of the modes with one pass.

Click here to display the frame comparison in a new tab.


Once again the addition of the second pass is the most important element in maximising the quality of this scene.


K-On !! 1080p


Encoding times are particularly long here, ‘faster’ 1p was executed in real time.

Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


The scores are once again very high across all modes. Is ‘superfast’ 1p enough here?

Click here to display the frame comparison in a new tab.


As of ‘superfast’ 1p the quality is excellent

Click here to display the frame comparison in a new tab.


All modes do a good job here.


Summary
The StaxRip/x264 combination isn’t perfect. The addition of a multithreaded decoder would probably improve the speeds of the slowest encode modes. Apart from the competition element however, this probably wouldn't make all that much difference in the end. If you’re looking for quality, x264 does the job and if you’re willing to add the time required for a second pass, it literally outdoes all the competition, including when you compare a fast double pass mode such as ‘veryfast’ with one of the slower single pass modes.


Page 24
720p Performance Recap


720p Performance Recap
If you want to compare our encoders one against another, we have grouped their results in the following graphs.


Avatar 720p
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.

Click here to display the frame comparison in a new tab.


Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.

Click here to display the frame comparison in a new tab.



Inception 720p
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Click here to display the frame comparison in a new tab.


Click here to display the frame comparison in a new tab.
K-On !! 720
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene..


Click here to display the frame comparison in a new tab.


Click here to display the frame comparison in a new tab.


Page 25
1080p Performance Recap

br>

1080p Performance Recap
If you want to compare our encoders one against another, we have grouped their results in the following graphs.


Avatar 1080p
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.

Click here to display the frame comparison in a new tab.


Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.

Click here to display the frame comparison in a new tab.


Inception 1080p
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Click here to display the frame comparison in a new tab.


Click here to display the frame comparison in a new tab.
K-On !! 1080
Use an HTML5 compatible browser to view the graph!
Click here to view the PSNR graph of this scene.


Click here to display the frame comparison in a new tab.


Click here to display the frame comparison in a new tab.


Page 26
Energy consumption/Time Recap


Energy consumption/Time Recap
Here are the transcoding times for each of the encoders tested in this report:


To recap, the encoding time for the Arcsoft encoders is particularly slow in this scene, probably because of the transcoding of the audio track, which seems to be a problem in this application. In terms of pure rapidity, the HD 3000 encoder is well placed, as are the NVIDIA via MediaCoder encoders. x264 can be fast, but its ‘ultrafast’ mode quality is unusable to our taste.

Hold the mouse over the graph to see energy consumption in Watts.


We have calculated the real consumption of each encoding by multiplying the encode time by the energy consumption. The results are quite interesting and bring a few things to mind:
-If you look at the energy consumption alone, you'll see that the NVIDIA graphics cards consume most energy when running. This is no surprise as these cards go into 3D mode during encoding (to what end?).
-The HD 3000 solutions are by far the most efficient and take most of the top spots. x264 ‘superfast’ is also pretty well placed. To our mind, ‘veryfast’ 2p is the best compromise between quality, encoding time and energy consumption.


Page 27
Conclusion


Conclusion
It’s difficult to provide a succinct conclusion to so many tests, but one point does stand out: GPU acceleration of H.264 transcoding isn’t on a par with encoding carried out by CPUs.


What these solutions bring most of all is frustration. Whether NVIDIA, AMD or Intel solutions, rapidity has been accentuated to the detriment of quality. It's rather surprising to see that with software such as MediaConverter from Arcsoft or MediaEspresso from Cyberlink, CPU encoders systematically give a better result in terms of quality than the integrated GPU encoders!

It's also extremely annoying to note that in the case of GeForce and Radeon encoding, there’s no difference in speed between graphics cards costing 100, 170 or 330 euros. Quality is strictly identical from one card to another – except in the case of bugs – and encoding times are no different from one card to another either. In no case was the GPGPU power of our graphics cards fully used.

The use of graphics cards in such tasks still requires the help of a CPU. Even when GPU decoding and encoding are both used, with Cyberlink for example, CPU core occupation is 100% for one core with the Radeons or the Intel HD 3000, 100% for two cores with the GeForces and as many as four cores at 100% occupation with Arcsoft.

Another important particularity is that the decoding carried out by a GPU often puts a brake on the performance of GPU encoders. Once again, this is logical: H.264 decoding isn‘t carried out by the GPU’s processing units but by a dedicated ASIC which serves to decode videos such as Blu-rays. This decoding doesn’t need to be done extremely fast as the playback of such media is in real time. In spite of its faults, MediaCoder proves that to get the most out of GPU encoders in terms of speed (without, for all that, creating any variation between our differently priced cards…), you have to use a multithreaded CPU decoder.


It's difficult to recommend any one of the three solutions using GPUs for transcoding over the others. The Arcsoft application offers the encoder which, apart from x264, gets the best scores and is also extremely fast. Visually, results with the Arcsoft encoders are blurred but they may be sufficient for mobile peripherals if you’re not too fussy. These results were however obtained solely with the Media Converter CPU encoder, its CUDA version literally being a nightmare and its Radeon version, even though producing higher quality, being limited (like the rest) to a baseline profile which can’t compete with the highest H.264 profiles. Moving onto the Intel/MediaSDK version, although it obtained high SSIM and PSRN scores, it doesn’t measure up to the naked eye and important parts of the image (faces and so on) are very blurred.

The Cyberlink application rules itself out in only offering baseline type H.264 encoding, which isn’t up to handling action scenes. The numerous implementation bugs in the Cyberlink and MediaSDK software doesn’t help either (white and black squares make the encoded files unusable) and the fact that the encoders are unable to insert an I-frame when a new scene starts creates disagreeable flickering. The quality of the NVIDIA and AMD renderings is okay, which does at least represent some progress on the Arcsoft application for the GeForces. In practice however, the use of baseline H.264 profiles remains a handicap that it is impossible to compensate for.

MediaCoder is the fastest, the most configurable and the most efficient of the GPU encoding applications. If you can stand the advertising however, the fact that the files it produces lack frames is nonetheless a bit of an issue. At least it’s free. From a qualitative point of view, the NVIDIA encoder has the advantage here over the Intel, which tends to blur frames a little too much.

The StaxRip/x264 combination wins hands down for quality. With an equal number of passes, the ‘faster’/’fast' modes generally do as well, if not better, than the rest of the encoders tested here. If you only retain one thing from this article however, make sure you remember that the simplest way of increasing quality is simply to add a second pass. It is all the more annoying that NVIDIA, AMD and Intel could easily offer this second pass in their development kits and thus miraculously homogenise quality throughout their encoded videos.

Of course, you pay for this superior quality with higher encoding times, equivalent to real time at 1080p (one hour of encoding for around one hour of film in ‘veryfast’ 2p mode with Avatar for example). You can reduce quality too. Some users will find the ‘faster’ 1p modes give a more attractive compromise by more or less halving encoding time, but below this quality really does start to suffer.


In trying to sum up what AMD, NVIDIA and Intel are offering via these third party applications, we do owe it to ourselves to make a few remarks. Firstly, the AMD encoder is anything but stable. The applications that it was running in crashed on numerous occasions, something that we were able to reproduce when launching the AMD transcoding interface manually (you can access it via the CCC control panel). In limiting the encoder to the baseline profile here again, AMD isn‘t giving itself any real opportunity in terms of decent quality overall. This is particularly regrettable as in static scenes the quality is often pretty good. Implementing a high H.264 profile would be a good idea.

NVIDIA possibly has the most advanced SDK and its results are often the best when it comes to GPU acceleration. Nevertheless, the visual quality remains poor, to the point where the pure CPU solutions are often able to give an equivalent quality/encoding time ratio. What’s more, power consumption of these graphics cards is very high.

The Intel offer is the most surprising of the lot. The video encoding is managed by units added to the GPU which seem to accelerate a large part of the decoding. With very low CPU occupation, energy consumption is by far the lowest. In terms of issues with the Intel solution, you need to have an H67 or Z68 motherboard to run it, which greatly reduces the potential user base. Even if you do have one of these however, the visual quality of the encoded files frankly leaves too much to be desired for it to be really usable.

At the end of the day, the marketing promises in terms of GPGPU transcoding haven’t been kept. The manufacturers highlight the rapidity of their solutions as a solution to the very real problem of the excessive amount of time required for CPUs to encode video alone. By offering rapid encoding solutions, but with quality that leaves too much to be desired, H.264 encoding via GPGPU solutions remains, as yet, a poor solution to what is a real problem.


Copyright © 1997-2014 BeHardware. All rights reserved.