Redundant Array of Independent (originally Inexpensive) Disks (RAID) storage has revolutionised enterprise data storage technology, building in the peace of mind of redundancy (from RAID 1 & above) which can greatly minimise downtime suffered due to individual drive failures.
Unfortunately though, RAID storage isn't a perfect technology and as a result data loss can still occur.
What is a RAID storage system?
The definition of RAID is in the name of the system itself. Standing for Redundant Array of Independent Disks, RAID literally means redundant bundling of independent disks – a cluster of multiple hard drives to get a single logical partition. Depending on the purpose, RAID increases the performance of accessing and writing data, while improving the security of the information.
RAID is a technology that supports the use of 2 or more hard drives in various configurations for the purposes of achieving greater performance, reliability and larger volume sizes through the use of consolidating disk resources and parity calculations.
A number of standard configurations were designed which are referred to as levels. There were five RAID levels originally created, but many more variations have evolved, notably several nested levels and many non-standard levels (mostly proprietary).
So, what do the RAID levels mean? The numbers simply refer to the configuration of the RAID. Knowing that all RAID systems store data efficiently, the choice of system will be based on your own personal needs. The RAID 1 for example meets the needs of performance and reliability. The RAID 5 is a good choice if you are looking for both performance and fault tolerance.
RAID system history
RAID is the acronym for Redundant Array of Inexpensive Disks (Redundant Array of Independent Disks). The concept was born at the University of California, Berkeley, where David A. Patterson, Garth Gibson and Randy H. Katz were collaborating to produce operational prototypes of five levels of RAID storage systems. The result of their research has formed the basis of the complex RAID storage systems that exist today. Today IBM holds the intellectual property rights on RAID 5.
The design of the RAID storage system was aimed at improving the performance, recovery, reliability, and scalability of storage. The result was a unique redundancy concept that offers data recovery capabilities in the event that a drive fails in the system. In fact, RAID controller cards have acquired the ability to continue reading and writing data even if a disk is "offline".
RAID system overview
RAID is a term used for computer data storage schemes that spread and or replicate data among multiple hard disk drives. RAID was designed with two key goals: to increased data reliability and increased I/O (input/output) performance.
A RAID combines physical hard disks into a single logical unit by using either special hardware or software. Hardware RAID solutions can come in a variety styles, from built onto the motherboard or add in cards, up to large enterprise NAS or SAN servers. With these setups the operating system is unaware of the technical workings or the RAID. Software solutions are typically implemented in the operating system.
There are three key concepts in RAID:
- mirroring, the copying of data to more than one disk
- striping, the splitting of data across more than one disk
- error correction, where redundant data is stored to allow problems to be detected and possibly fixed (known as fault tolerance)
Different RAID levels use one or more of these techniques, depending on the system requirements.
RAID is traditionally used on servers, but can be also used on workstations. The latter is especially true in storage-intensive computers such as those used for video and audio editing.
The logical system configuration splits the data in tapes across all physical disks. This makes it possible to have a balanced data rate on all the disks; instead of having a disk that does all the work of reading and writing the data, all the disks work together. The data is therefore distributed over all the physical disks.
Redundant or fault-tolerant operations
Redundancy in a current RAID 5 configuration is the result of using a Boolean mathematical function called "Exclusive OR" (XOR). This is commonly referred to as parity. The XOR function is a logical binary process. It is best to consider the parity as a combination of the data blocks of the other disk. Each byte that is written in a data block is calculated relative to the other data blocks. The parity obtained is written in the parity block for this given band. This calculation will always work, regardless of a missing data block. However, the calculation will not work if two blocks are missing – one of the limitations of RAID 5 – it will not provide adequate redundancy if two or more disks fail.
As previously indicated, the controller board splits the data into tapes and also performs the XOR function on that data. The amount of logic computations performed by the controller every second is impressive. Current RAID controllers are highly sophisticated hardware systems, including SDRAM processors and memory banks specifically designed for performance and redundancy.RAID configurations
RAID data recovery
Although RAID storage systems are designed to be fault-tolerant, some hardware or other failures can make your data inaccessible. If your system encounters such problems with its RAID array, Ontrack Data Recovery is an expert in RAID data recovery procedures and techniques.
Our engineers agree that recovering RAID arrays is one of the most difficult technical aspects of data recovery. The evaluation of a RAID system recovery is actually the combination of two very important steps:
- The first, and the longest, is the reassembly of the matrix. When the Ontrack Data Recovery engineers reassemble the logical system, they thoroughly research how the data is organised across all disks. They then know the order of disks and the arrangement of data blocks and parity blocks. This investment in time is necessary to determine the original configuration and obtain a quality recovery
- The second step is to work on the logical file system. Current enterprise log file systems are extremely complex. If the RAID array is down, there will be thousands of errors within the file system. Ontrack Data Recovery engineers will verify and confirm that the matrix is structured correctly before copying any data. This extra step guarantees a quality recovery.
Commonly used RAID vocabulary
RAID: RAID is a technology that supports the use of 2 or more hard drives in various configurations for the purposes of achieving greater performance, reliability and larger volume sizes through the use of consolidating disk resources and parity calculations.
Parity: A mathematical calculation which allows drives within a RAID array to fail without the loss of data. The simplest way to show this is the equation: A + B = C. You can remove anyone of the letters from above and work out its value from the 2 remaining. I.e. if B was removed so the equation looked like A + ? = C, then B's value can be worked out by moving the A, so B = C – A. This is obviously a simplistic way of describing it, to fully understand it in a RAID sense, knowledge of binary and the logical XOR expression is required.
Mirroring: The data from 1 or more hard drives is duplicated onto another physical disk(s).
Striping: The method that data and parity can be written across multiple disks. In the example below the data is written across the drives in a sequential order until the last drive, it then jumps back to the first and starts a 2nd stripe.
Block: A block is the logical space on each disk where the data is written, the amount of space is set by the RAID controller and most commonly would be 16KB to 256KB in size. The data will fill up the space until the limit is reached and then move onto the next drive, until the last drive when it will jump to the start of the next stripe.
Left / Right Symmetry: The symmetry in a RAID controls how the data and parity are distributed across the drives. There are four main styles of symmetry, which one is used depends on the RAID vender. Some companies also make proprietary styles depending on their business needs.
Hot Spare: There are a few different methods for dealing with drive failures within a RAID, one is the use of a Hot Spare. It is a spare disk which can be used in place of the failed one.
Degraded mode: This happens when a drive in the RAID becomes unreadable, the drive is then considered bad and is withdrawn from the RAID. The new data and parity are then written to the remaining drives within the RAID, if any data is requested from the failed drive it is worked out with the parity on the others. This degrades the performance of the RAID, hence degraded mode.