RAAAAIIIIID? (disk failures)

I’m sitting here with two dead hard drives on my desk. One 250GB SATA drive, one 160GB, both Seagate. The 250GB drive is a little over two years old, the 160GB a bit over a year old. The point I’d like to make isn’t that “Seagate sucks” – drives fail. I buy Seagate drives for the long warranties, to give them the most possible time to be replaced for free when they DO fail. No, the point is that I didn’t lose any data. Not one byte.

RAID is a technology that can, in most configurations, survive at least one drive failure without losing data. Some implementations, such as RAID 1+0 (“RAID 10”) or RAID 6 can survive multiple failures before losing data. Then there is RAID 0 which offers no redundancy at all, and is used solely by idiots or people concerned with performance only. RAID 0 is even worse for redundancy than having just a single drive, because the failure of any drive in the array causes a complete loss of data. It is basically a given that any server racked in a data center is going to be using RAID to protect its data and availability. The news that many may not realize is that fairly recently this technology has become commonly available in end-user systems.

Linux users probably already know that Linux has a really nice software implementation of RAID covering even RAID 6, if not I would point them at this document for starters. Most people, though, use Windows XP. XP doesn’t have support for software RAID, at least not without some dubious hacks to enable the included (but locked out by Microsoft) support. You’ll need an add-on card or a motherboard with onboard chipset supporting SATA RAID. The inexpensive options usually only support RAID 1, RAID 0, and maybe RAID 1+0. For example, the AMD 790FX platform, which uses the ATI SB600 southbridge, provides these.

Many, especially in the Linux camp, will argue that these implementations are really just software RAID handled by the driver, and not a true hardware implementation. This may be true, but have you looked at the price of a “true” hardware RAID controller, supporting RAID 5+ and having all that wizard cache and everything? For end-users who only want to run Windows, and let’s face it, this is the majority of the current population of users, this “fake RAID” is fine. For example, the 160GB drive I recently (yesterday!) lost was a member of a 4 drive RAID 1+0. I was actively using the machine, which runs Windows XP 64-bit, when the drive failed. A nice pop-up alerted me to the fact. Nothing crashed, nothing was lost, it was all handled gracefully. I know that the old ATA, not SATA, implementations often crashed the machine when a drive failed, although this still prevented the largest amount of data loss, except that due to the crash.

The variety of offerings in this inexpensive, end-user suitable RAID arena make doing a HOWTO really kind of pointless. If you’re not familiar with building PCs or don’t know somebody who does, Dell and HP may offer desktop systems built with this already.

Users looking to convert existing systems with single drives to RAID may find that a reinstall is necessary, as many implementations do not preserve data when creating the initial array, so this may not make sense for many. However, everyone should consider any new machine that will hold personal data to be a candidate for RAID. In closing just let me say this. Take a look at the computer you’re using right now. Go ahead and fire up Outlook, look in your My Documents folder. Hell, look at your bookmarks. Go to your banking site and log in with the saved password.

Now, for a moment, let’s imagine that the drive in your machine has failed. Not that it might fail, but that it HAS failed. As a rule, drives always fail – it’s only a matter of time. Now you’ve lost all your email, all your documents, all your bookmarks, all your passwords if you save them. You’ve lost that desktop background of your kids. You’ve lost your savegames, that pirated copy of Photoshop, the VPN connection to the office that you spent all that time getting to work correctly. Maybe you use Quicken, that’s gone too along with all the data you haven’t backed up about your finances. It’s all gone. Try to imagine that for a minute or two.

RAID can keep this from happening. No, it won’t protect you from viruses, or an exploding power supply that takes out two drives at once, or accidentally deleting your My Documents folder. Yes, you could (and should) keep a lot of this data backed up. Yes, a lot of this stuff can be kept online (gmail for example). Yes, you could conceivably, for hundreds of dollars, recover most or all of the data on the drive with a data recovery service – for the record, we have done this a couple times with data center drives and half the time the drive is too damaged to recover anything affordably. But how nice would it be to get that pop-up notifying you the drive failed while you continue to work? Swap in the replacement drive, and all is well again.

I don’t know about you, but with how many drives I have seen fail over the years, I’m convinced a few extra dollars is worth it.

April 16, 2009 В· agw В· 4 Comments
Tags: , ,  В· Posted in: Linux, Technical

4 Responses

  1. awhitlock.net » Self-repairing hardware - April 28, 2009

    […] this post I rambled on and on about the benefits of redundant storage. Well, somewhat dubiously the […]

  2. Elliott - October 7, 2009

    x2 about seagate sucking. I buy black Western Digital at this point.

  3. Elliott - October 7, 2009

    And yes I realize that seagate sucking wasn’t your main point, my previous comment stands ;)

    RAID is a must, but not a panacea. Even arrays can fail, but it’s rare. The rule I live by is that important data lives in multiple places.

  4. agw - October 12, 2009

    Indeed. My important stuff is on a few different machines, stuff I don’t care much about (media files mostly) I handle with monthly CRC checks of the arrays.

Leave a Reply