Hi, I’m Mikey from Kroll Ontrack data recovery and welcome to Storage Board!
In this episode we’re going to looking at RAID systems – what they are and how they work – but before we go any further let’s define exactly what RAID stands for.
RAID stands for the Redundant Array of Independent Disks and it is a method of data storage where you can distribute or spread data across multiple disks, all with the aim of increasing performance and reliability.
It’s no surprise then that organisations are the top users of RAID systems due to the increase in reliability; it’s a concept that’s been around for 30 years and you don’t have to be a company to use it – you could be an individual perhaps working on video editing or music production. It’s used not only to increase performance and reliability over using a single drive, but also to increase volume sizes, which is why it is favoured by organisations so much.
Hardware vs. software RAID
There are two main ways of setting up your RAID system; firstly there is a hardware setup; it involves having your host computer (or it could be a server), and a RAID controller in between, which in turn controls the RAID system itself. In a hardware setup, the RAID controller is responsible for everything to do with RAID; the reading and writing of data, plus where it is stored/what drives it gets written to. The host operating system has no knowledge of the fact that there are multiple drives within the RAID system and sees it all as one logical unit. In a software RAID array it is slightly different; the RAID controller is implemented with the operating system, which has a bit of a decrease in performance as the OS is then doing multiple things at once (as there’s no separate hardware RAID controller).
Before we go onto RAID levels and exactly how they work in practice, we’re going to take a look at some key terms to find out what these RAID systems are.
Firstly there’s ‘parity’. Parity is a very important concept within RAID – it’s a way of distributing data across multiple drives to aid with load distribution and the recovery of data if something was to go wrong.
Next we have ‘redundancy’, which in a computer science sense is the duplication of critical components, so if one was to fail then the whole system does not go down with it. In the case of RAID systems, these components are the drives. We’ll go into more detail on that again shortly.
The other two really key concepts in RAID are ‘mirroring’ and ‘striping’. Mirroring is quite like what it says on the tin; it’s mirroring data from one drive to another, which replicates the exact same information so it can be recovered if something was to go wrong.
Then we have striping, which is when data is written sequentially across multiple disks and we’ll check out how that works in a moment within a ‘RAID 0’ setup.
Before we do that, it’s worth mentioning that there are many different RAID levels out there! We’ve just picked four for the purposes of this video, but for example if you’re a company with bespoke applications or databases you may look to create your own RAID level depending on what your exact needs are. Levels go from RAID 0 all the way up to RAID 61 and beyond, but there are also many other nested or bespoke levels. The ones we’ll look at here are the four most basic levels.
With a RAID 0 setup there must be at least two drives and it uses the concept of striping. As you can see, the data is striped across the two disks, which is fantastic in terms of read/write performance over a single disk; however it’s not great in terms of redundancy. That’s because if one of those drives was to fail (let’s say drive 1 in this case), that data is not replicated anywhere else, which will cause some headaches as there’s nowhere to get it back from!
RAID 1 is our next level, which takes the concept of mirroring that we looked at earlier. Again, there are two drives in this setup, and it is mirroring the data from the first drive onto the second drive. This means that if drive 1 was to fail in this RAID configuration, you would be able to restore the data with no issues (as the same data is on drive 2). It adds redundancy and data security, and it’s the lowest form of redundancy available within RAID.
Now let’s move onto the more complex levels and take a look at RAID 5. This introduces the concept of parity – the distributing of data across multiple drives to aid recovery. You can see here that we have four drives in this setup (RAID 5 requires at least three drives) and you can see the parity is highlighted here in red. If one drive was to fail – let’s say drive 4 – the data can be rebuilt using the parity from the other drives (the parity in this RAID 5 takes up the space of one drive in total and it can therefore tolerate one drive failure).
With RAID 5 you can go one step further and configure a fifth drive as what’s called a ‘hot spare’. This is an idle drive that sits within the system with no data written to it, but if one drive was to fail – let’s take drive 4 again – the hot spare (drive 5) would take the place of the failed drive 4 and it would be written to based on the parity across the other drives and no data gets lost. What you can then do is take out the failed drive and insert a new one into the array, which then becomes your new hot spare. It’s another good way of adding redundancy to prevent data loss.
Lastly we have RAID 6, which takes the concept of parity one step further to what’s known as ‘dual parity’. You can see here in the RAID 6 array we have five drives (RAID 6 requires at least four drives) and you can also see the dual parity is spread across them, taking up the space of two drives in total. This allows for two drives to fail within the array before there’s a problem with getting the data back! This adds some sense of reliability and data security so that if two drives were to go down, with the dual parity across the other drives you can rebuild the data within that array.
It’s worth mentioning that redundancy and parity are not the same as having backups; always remember to keep separate backups of your RAID system.
That sums up RAID in a nutshell – I hope you found this video useful and if you would like more information please check out your local Kroll Ontrack office online. Thanks for watching!
Mikey has been with Ontrack for 6 years and is the Partner Program Manager, based out of the Epsom, UK office. He is a regular contributor to the Ontrack blog, usually writing about how different types of data storage technologies work. Outside of work you’ll find him either with a guitar in his hands or learning about rocket science.