Is the Radeon HD 7870 reliable? A small number of users of this very popular model have been affected by some stability problems, problems that we have been able to reproduce and contact AMD about with a view to finding a solution. If your card is affected, it should be replaced.
Identifying the problem…Since spring, a small number of the gamers who have bought the Radeon HD 7870 have noted a recurrent but more or less random problem: the system crashes and they get a black screen (loss of video signal). This is rather annoying, especially as after-sales services have often struggled to reproduce the problem and in some cases have simply sent the card back as it was, further annoying customers some of whom have had no option but to get a competitor card instead.
When a crash is random it can be difficult to reproduce, especially as it can be difficult to say if the crash has resulted from an issue in the rest of the system or because the user has been overclocking the card but forgotten to say so. This is part of the reason why we didn’t worry particularly when the first complaints started coming out, especially as there are always a few defective cards in any run anyway.
Although the complaints only represented a small number of the gamers who have bought this model, the fact that this specific issue was mentioned so often, particularly for Sapphire cards, which make up the majority of sales, was becoming worrying.
When working on a recent report, we took the opportunity of getting a Sapphire HD 7870 OC card from a store and as luck would have it, we came across the problem on our main test system: a system crash and loss of the video signal in some games, notably Crysis Warhead. We tried to isolate the problem by reducing the GPU clock, switching to PCI Express 2.0 rather than 3.0, changing the driver, changing the power supply and so on but drew a blank. The problem continued to appear from time to time - we did notice that the higher the GPU clock, the more likely the problem was to appear.
We then got hold of several samples of the same model and observed the problem on just one of them. Not all cards were affected, then, and moreover the problem didn't show up on another test system we use for noise and heat readings. Next, we were able to observe the same crashes on the reference Radeon HD 7850 press sample as well as on the XFX Radeon HD 7870 Black Edition that we had been sent at the beginning of March on launch of these models.
We contacted both Sapphire and AMD about this issue affecting some of their cards. Sapphire certainly didn’t show itself to be over-enthusiastic about looking further into the problem. At the end of September, we were however in a position to send a problematic Sapphire sample to AMD Toronto, along with as many details as we could so that the AMD engineers would be able to confirm the problem and perhaps provide a solution. This was good timing as AMD, concerned as it was about its brand image on what was one of the most popular cards in its range, had been trying in vain to reproduce the problem for weeks.
After tearing their hair looking for a specific link between certain hardware combinations, the AMD engineers ended up working out what the problem was a few days ago. It was a problem to which they had already found a solution… six months ago on launch of the Radeon HD 7870!
During pre-production of the Radeon HD 7870 and probably on the first lots, AMD noted a little reliability issue on some cards, less than 1% of them according to the company. The GPU electrical signal could at times, depending on the environment, be rather noisy on these samples, resulting in a crash.
The problem was linked to some small ceramic capacitors from certain manufacturers, the quality on some samples of which was probably insufficient. AMD thus informed all manufacturers and specified which lots of components were leading to the problem. The problem was thus stopped in time. At least that’s what AMD thought. The company was therefore somewhat surprised to find that the issue was appearing on the recent store sample that we sent them.
Some of these small capacitors at the back of the GPU were the source of the problem. Above you can see our card that was corrected thanks to AMD.
How was it that the issue was once again appearing? In contrast to what AMD initially thought when correcting our sample manually, our sample wasn’t a pre-production press card that dated from launch but rather a recent sample! Moreover, we recently noted a returns rate of over 6% on the Sapphire Radeon HD 7870 OC
, which is relatively high for a card in this range, the usual level being 2 to 3%. While AMD has refused to make any comment on its partners, it has become clear that for an unknown reason, Sapphire failed to integrate the corrective requested by AMD when it initiated mass production. This is what has caused these crashes on some cards and when they are in an environment that accentuates electrical signal noise.
… and problem solved
As the problem has now been clearly identified, where does this put customers? Sapphire, more responsive now, tells us that it has now rectified the production issue, has recalled stock from French etailers and smoothed the returns procedure even when after sales doesn’t manage to reproduce the problem. No precise timing has however been given on these points.
If your Sapphire card has been affected by black screen with loss of signal in games, you will thus be entitled to a replacement. New purchases will gradually be made up solely of modified cards so the problem should cease to exist. As things stand, we recommend you to contact your reseller before purchase to make sure that you are ordering a newer model which cannot be affected by this issue.
What about other makes? May they also be affected? We don’t have a clear answer on this. Some users have observed an apparently similar issue with other brands but we haven’t been able to confirm this ourselves and we don’t know if these cards have been produced recently or not as other brands don’t sell the same volumes as Sapphire in the AMD range. Although we did look at some pre-prod samples from AMD and XFX, in principle, we can be hopeful that other manufacturers haven’t also neglected to apply AMD’s corrective measures. It may also simply be that they don’t get their capacitors from the supplier supplying the problematic components.
Asus and VTX3D, for example, don't use the reference PCB but this doesn’t necessarily mean this PCB is automatically immunised against the problem. We did see that the final stage of the GPU's electronic filter is indeed the same as the one on the reference PCB. In other words, everyting depends on the type of capacitor used on this circuit and as they can’t be identified visually, there’s no way of knowing for sure.
We have contacted all the AMD partners to find out what the situation is. We're still waiting for answers from Asus, HIS, MSI and XFX. Gigabyte told us that it wasn’t affected by the problem, though it didn’t confirm that the AMD corrective had been applied. The same goes for VTX3D, which however says that it has put an internal procedure into place (via firstname.lastname@example.org) so as to be able to assist customers who have problems with black screens, if only to get information on what might result from a separate problem. Of course, answers from manufacturers should be taken with a grain of salt.
For our part, we are of course delighted to see a resolution of a problem some of our readers have been suffering from and we hope that the other manufacturers, who are perhaps totally unconcerned, will rapidly make clear their position on the subject. While anyone can make a mistake of course, we certainly hope that waiting for AMD to reproduce an issue experienced on Sapphire cards and give Sapphire the solution isn't the usual Sapphire support procedure!
Finally, we note that while Nvidia is often criticised for keeping its partners on too short a leash, it may well be that AMD isn’t checking up on its partners enough. To safeguard its brand image, it would probably be useful to make sure in the future that such a corrective measure is applied by its biggest partners. Update 30/10/2012:
Asus and MSI have now contacted us regarding this problem. Asus says that it was aware of the latest BOM (Bill of Material) sent by AMD and that its products are not affected. MSI tells us that there is no clear indication that its products are affected but that it is investigating to determine if one of its runs could have been. If that were to be the case, MSI says, there would be no issue about it footing the bill. Update 2/11/2012:
HIS has just communicated to us its position with respect to the issue with black screens: the modification submitted by AMD was applied to productions as of June. More exactly, the problem won't appear on any cards with a series number beginning with H1206 and later. HIS says however that no problems have been confirmed on the small volume of cards imported to France before this date, but if users do come across it, its after sales service (email@example.com) will of course be at their disposal to find a solution.Update 5/11/2012:
MSI has now also replied regarding the problem. It explains that it was alerted to the component change in May, though without receiving from AMD any detailed reason for the change, and that productions as of June were modified. Some product lots from before June may therefore be affected and MSI is currently informing stockists and etailers so that their after sale services can deal with any issues resulting from the problem.