Storage Board: What is RAID? [video]
Hi, I’m Mikey from Ontrack data recovery and welcome to Storage Board! In this episode we’re going to looking at RAID systems – what they are and how they work – but before we go any further, let’s define exactly what RAID stands for.
RAID stands for the Redundant Array of Independent Disks and it is a method of data storage where you can distribute or spread data across multiple disks, all with the aim of increasing performance and reliability.
It is no surprise then that organizations are the top users of RAID systems due to the increase in reliability; a concept that has been around for 30 years, and you don’t have to be a company to use it – you could be an individual, perhaps working on video editing or music production. It is used not only to increase performance and reliability over using a single drive, but also to increase volume sizes, which is why it is favored by organizations so much.
Hardware vs. software RAID
There are two main ways of setting up your RAID system; first, there is a hardware setup; it involves having your host computer (or it could be a server), and a RAID controller in between, which in turn, controls the RAID system itself. In a hardware setup, the RAID controller is responsible for everything to do with RAID; the reading and writing of data, plus where it is stored and what drives it gets written to. The host operating system has no knowledge of the fact that there are multiple drives within the RAID system and sees it all as one logical unit. In a software RAID array it is slightly different; the RAID controller is implemented with the operating system, which has a bit of a decrease in performance as the OS is then doing multiple things at once (as there’s no separate hardware RAID controller).
RAID levels and exactly how they work in practice, we’re going to take a look at some key terms to find out what these RAID systems are.
First, there is ‘parity’. Parity is a very important concept within RAID. It is a way of distributing data across multiple drives to aid with load distribution and the recovery of data if something were to go wrong.
Next, we have ‘redundancy’, which in a computer science sense, is the duplication of critical components, so if one was to fail then the whole system does not go down with it. In the case of RAID systems, these components are the drives. We’ll go into more detail on that again shortly.
The other two really key concepts in RAID are ‘mirroring’ and ‘striping’. Mirroring is quite like what it says on the tin; it’s mirroring data from one drive to another, which replicates the exact same information so it can be recovered if something were to go wrong.
Then we have striping, which is when data is written sequentially across multiple disks and we’ll check out how that works in a moment within a ‘RAID 0’ setup.
Before we do that, it is worth mentioning that there are many different RAID levels out there. We’ve just picked four for the purpose of this video, but for example, if you are a company with custom applications or databases, you may want to create your own RAID level depending on what your exact needs are. Levels go from RAID 0 all the way up to RAID 61 and beyond, but there are also many other nested or custom levels. The ones we will look at here are the four most basic levels.
With a RAID 0 setup, there must be at least two drives using the concept of striping. As you can see, the data is striped across the two disks, which is fantastic in terms of read and write performance over a single disk. However, it is not great in terms of redundancy. This is because if one of those drives were to fail (let’s say drive 1 in this case), that data is not replicated anywhere else, which will cause some headaches as there’s nowhere to get it back from.
RAID 1 is our next level, which takes the concept of mirroring that we looked at earlier. Again, there are two drives in this setup, and it is mirroring the data from the first drive onto the second drive. This means that if drive 1 were to fail in this RAID configuration, you would be able to restore the data with no issues (as the same data is on drive 2). It adds redundancy and data security and is the lowest form of redundancy available within RAID.
Now let’s move onto the more complex levels and take a look at RAID 5. This introduces the concept of parity – the distributing of data across multiple drives to aid recovery. You can see here that we have four drives in this setup (RAID 5 requires at least three drives) and you can see the parity is highlighted here in red. If one drive were to fail - let’s say drive 4 – the data can be rebuilt using the parity from the other drives (the parity in this RAID 5 takes up the space of one drive in total and it can therefore tolerate one drive failure).
With RAID 5, you can go one step further and configure a fifth drive: a ‘hot spare’. This is an idle drive that sits within the system with no data written to it, but if one drive were to fail – let’s take drive 4 again – the hot spare (drive 5) would take the place of failed drive 4 and it would be written based on the parity across the other drives and no data gets lost. What you can then do is take out the failed drive and insert a new one into the array, which then becomes your new hot spare. This is another good way of adding redundancy to prevent data loss.
Lastly, we have RAID 6, which takes the concept of parity one step further to ‘dual parity’. You can see here in the RAID 6 array, we have five drives (RAID 6 requires at least four drives) and you can also see the dual parity span across them, taking up the space of two drives in total. This allows for two drives to fail within the array before there’s a problem with getting the data back. This adds some sense of reliability and data security so that if two drives were to go down, with the dual parity across the other drives, you can rebuild the data within that array.
It’s worth mentioning that redundancy and parity are not the same as having backups; always remember to keep separate backups of your RAID system.
Do you store data on a RAID system? What level do you use and why? Let us know by commenting below.