Do you need access to your data 24/7? Do you need fast access to your data? If the answer to any of these questions is yes you should use RAID for your data storage.
What is RAID? It’s an acronym meaning either “Redundant Array of Independent Disks” or “Redundant Array of Inexpensive Disks”. The two meanings are just something inherited from the past, many times both meanings apply. The basic idea is that you store the data on more than one disk and by using a few pretty cool ideas you can protect your data from problems like a failing disk and/or increase the performance of the storage system. An important thing to remember is that RAID is not a substitute for backup. While RAID can guarantee that you can still access all your data even if some of the disks are not working it won’t help if you accidentally delete the wrong file, your data gets wiped out by a virus, a power supply failure in your computer fries all your disks, a disgruntled employee deletes files on purpose or formats a disk, a hacker gets into your system, your disks get stolen, the building burns or gets flooded, a hostile country starts dropping bombs in your location, a meteorite strikes your home … do you get the point? To protect yourself from these situations you need a backup, preferably stored off-site. If the meteorite is very big you better keep your backups very far away, like on another planet. After all we know what happened to the dinosaurs, right? They obviously had no backups…
RAID can be setup in many different ways, these configurations are described as RAID levels. Most common RAID levels are RAID-0, RAID-1, RAID-5 and RAID-6. There are also nested RAID levels like RAID-10 or RAID-50 where multiple RAID setups are placed on top of each other to provide more functionality than each of the individual levels. Each RAID level can be implemented either using specialized hardware or software. Today I will talk about RAID levels 0, 1, 5 and 6. Nested RAID levels will be the topic of the following post.
Let’s start with RAID-0, which is not really RAID in a sense that it doesn’t provide any redundancy. What RAID-0 does is that it provides higher performance for the storage. In RAID-0 data is spread over multiple disks in so called stripes. Basically each file (unless it’s very small) is stored on all the disks in the storage. The advantage is that when you want to read the file the system can read each piece of the file from a different disk and because of that it can do this N times faster, where N is the number of disks in your RAID. The same happens when writing. The effective size of your storage is going to be equal N times size of the smallest disk in RAID. RAID-0 provides high performance but unfortunately it doesn’t provide any redundancy for your data, in fact RAID-0 makes it more likely that you’re going to lose data. If any disk in RAID-0 fails you’re going to lose all your files since each file had a piece on this failed disk. And since there are multiple disks in RAID-0 it is more likely that one of them is going to fail than if you had just one disk. After any disk in RAID-0 fails you’ll have to restore all your data from the backups. You do have backups, don’t you?
A different approach is taken with RAID-1. This scheme fully deserves to be called RAID since it does provide redundancy for the data. In RAID-1 all data is stored on all the disks in RAID. Usually RAID-1 is used with 2 disks since this is most economical. You can use 3 or more disks in RAID-1 but if you really need this level of availability for your data then maybe you should look into other solutions like clustered storage. The biggest disadvantage of RAID-1 is the effective size of your storage is going to be equal to the size of the smallest disk in RAID. In terms of performance RAID-1 can provide an improvement when reading files (depending on the implementation different fragments of the file can be read from different disks since they all contain the same data) similar to RAID-0 but the write performance is the same as for a single disk (all data has to be written to each disk). When a disk fails in RAID-1 the data is still available but you need to replace it as soon as possible otherwise when the other disk fails you’ll have to restore all your data from the backups. You do have backups, don’t you?
Another common RAID scheme is called RAID-5. RAID-5 requires at least 3 disks to be used. It provides redundancy and also increased performance. In RAID-5 each data block is stored on one disk only but an additional information (called parity) for each block is stored on a different disk. The parity is a checksum which can be used by the system to recover data in case of disk failure. The parity information is spread over all the disks. This is done in such a way that the system can tolerate the failure of any one disk and not lose any data. The effective size of storage with RAID-5 is equal to (N-1) times the size of the smallest disk. Read performance of RAID-5 can approach (N-1) times that of a single disk. Write performance is rather poor since each write requires updating not only the data but also the parity information. The usual method to improve the write performance is to include cache memory in the RAID controller. RAID-5 is best used for systems where on average the number of reads highly exceeds the number of writes. When a disk in RAID-5 fails the data is still available but you need to replace the failed disk as soon as possible since from that point any further disk loss will mean you’ll have to restore all your data from the backups. You do have backups, don’t you?
An improvement on RAID-5 in terms of data redundancy is RAID-6. The difference here is that in RAID-6 parity is calculated twice independently and is stored on two disks. RAID-6 requires at least 4 disks to be used. Similar as with RAID-5 the parity information is spread over all the disks. The advantage is that the system can now survive the failure of any two disks without losing any data. The disadvantage is that the effective size of the storage is reduced to (N-2) times the size of the smallest disk. Performance of RAID-6 is a bit lower than of RAID-5 due to additional parity data. RAID-6 makes more sense with larger arrays (N>=12) because of the increased probability of double simultaneous disk failure. When two disks in RAID-6 fail the data is still available but you need to replace at least one of the failed disks as soon as possible since any further disk failure will mean you’ll have to restore all your data from the backups. You do have backups, don’t you?
Next week I’ll talk about nested RAID levels: RAID-01, RAID-10, RAID-50 and RAID-51 in Do you RAID? (part 2).
Any questions about RAID? Write a comment and I’ll get back to you.
Print This Post


corcho! y yo que pensé que hablabas de las cucarachas. Por aqui se dice que Raid las mata bien muertas
I was wondering if this would be the first association … he he
yeah!!!! that was my first association too!!!! but mami beat me to the post!!!! like mother like daughter… awesome