Tutorial: RAID

RAID stands for Redundant Array of Independent/Inexpensive Disks. If you need 12 Terra Bytes of storage but you can only get 2TB drives what do you do? Well a RAID is the answer, it can connect those drives together to create a single logical volume. It comes in a number of flavors each with it's own advantages and disadvantages. Because a single volume is comprised of multiple drives the odds of a single drive failing are increased so it is necessary to decide how important your data is and know what you can do to prevent a drive failure becoming a disaster.


RAID 0 is the simplest form and offers no redundancy. It uses two or more drives to increase the available storage size. The only complicated thing here is that every second (for simplicity) number is stored on the alternate drive. For example the list:1,7,6,3,5,4,3,5 are stored like this (see table below). The reason it's done this ways is a single drive can only read or write so fast, by doing it this way the drives can work together. Drive 1 can read 1,6,5,3 while Drive 2 reads 7,3,4,5. RAID 0 can use all the space available on the drives but if one drive fails the data on the other drive is useless.

Drive 1Drive 2
17
63
54
35

RAID 1 is a mirror. It doesn't offer any more space here than a single drive but it offers a lot of redundancy. If either drive fails the entirety of the data is available on the remaining drive.

Drive 1Drive 2
11
77
66
33

RAID 1+0 also known as RAID 10 is a combination of RAID 1 and RAID 0. Privides a RAID0 that is mirrored like RAID1. Here we can chain together a number of drives but still be protected in case of failure but we lose half our drives or space in producing redundancy.

Drive 1Drive 2
17
63
54
35
Drive 3Drive 4
17
63
54
35

RAID 5 offers a great compromise. uses one extra drive to survive one complete drive failure but using (slight oversimplification) one extra drive to reconstruct the data. So assuming all your drives are 1TB, 3 Drives will give you 2TB's, sacrificing 1TB. Or 6 drives will give you 5TB sacrificing 1TB.
Here's how it works, to keep things simple I won't use binary or hex but normal decimal numbers. The data stored on the drives is essentialist a list of numbers, Here you can see two drives, two lists of numbers. If we lose drive 1 or 2 that's it, the data is lost. What's more because every second number is stored on either drive (ie. the real list is: 1,7,6,3,5,4,3,5) the data on the reaming drive is useless. That's not good.

Drive 1Drive 2
17
63
54
35

What RAID 5 does is use an extra drive to help reconstruct any lost data. It doesn't use addition (XOR) but to keep our examples simple we'll pretend it does. It adds the number in drive 1 to the number in drive 2 and puts the answer on drive 3. So 1+7=8, 6+3=9 etc..

Drive 1Drive 2Drive 3
178
639
549
358

Now if we lose a drive, drive 2 in this example. We can easily figure out what it was meant to be; 1 + something = 8 , 6 + something = 9 etc.

Drive 1Drive 2Drive 3
1?8
6?9
5?9
3?8

Or if we lose the answers we can just recalculate them; 1 + 7 = what.

Drive 1Drive 2Drive 3
17?
63?
54?
35?

As long as we only lose one drive we can always reconstruct the data. Brilliant in it's simplicity.

Drive 1Drive 2Drive 3
178
639
549
358

In reality this is all done in binary, the answer is known as a parity bit, which is calculated using a very simple operation called exclusive-or or XOR and the answers/parity bits are not stored on a dedicated drive although you do lose one drives worth of space. The parity bits are alternatively placed across all the drives

Drive 1Drive 2Drive 3
110
000
101
000

But that's just to spread the loss and speed things up, it doesn't change the theory.

RAID 6 uses two drives worth of parity bits. It's capable of recovering from two drives failing using more complicated maths functions. The important thing to know here is that you lose two drives worth of space so with 3 1TB drives we only get 1TB of space with 5 drives we get 3TB with 8 drives with get 5TB etc.