RAID stands for Redundant Array of Inexpensive Disks. It’s a set of independent hard drives able to provide redundancy, designed to keep the system online, and are mainly used in businesses server environments.
With the continued fall in hard drive prices, it is also becoming more common to find RAID configurations at home. Though these systems are not comparable with enterprise-class RAID systems, they are often used in cheap Network Area Storage (NAS) systems and are connected to form a home network for storing and sharing videos, photos, music and other entertainment content between multiple devices.
On the other hand, enterprise RAID storage architectures perform tasks that are usually business-critical and their efficiency and uptime are paramount. These RAID systems support the operation of virtual environments (eg. VMware & Microsoft Hyper-V), database (eg. Microsoft SQL and Oracle), email systems like Microsoft Exchange Server and all the applications that need performance, reliability and scalability.
In these situations, the failure of a RAID system paired with a data loss is a real disaster to be avoided.
To trust is good, not to trust is better
RAID systems are more reliable than the individual spinning hard drives or individual SSDs. This is true because using multiple disks and providing redundancy they can tolerate the failure of one drive in the array whilst staying online. In the event where there is a RAID system failure, the reconstruction of the information can be done using the parity calculated and stored during normal write operations.
Since to trust is good but sometimes not to trust is better, below follows a collection of comments and suggestions from our experts to help you to work better and be more comfortable with your RAID system.
Some tips to work better with RAID systems
1. It’s not true that RAID systems do not lose data.
RAID systems limit the risk of data loss but do not eliminate the problem completely: no storage system is immune to data loss.
2. Take care, not all RAID systems provide the same level of redundancy!
Some RAID configurations do not provide fault tolerance, RAID 0 (striping) does not provide redundancy and parity calculation. Data in these systems are generally split between two disks and the failure of one drive will result in a data loss.
3. RAID works even if one drive fails but…
True. In general, except for a few cases, a RAID system works even if a disk fails. It is, for example, true for a RAID 5 system which is one of the most common configurations. However, the failure of a disk in a RAID 5 configuration should not be ignored. Replace as soon as possible the disk failed as the failure of a second unit will cause a data loss.
4. …RAID does not tolerate the failure of two disks
However, RAID 5 configurations do not tolerate the failure of two disks. The fault tolerance of two drives is supported by the RAID 6 configuration and works using dual-parity. In environments that require high fault tolerance, a RAID 6 configuration is preferable. If performance is the goal, RAID 5 can provide a good compromise between security and performance as writing operations are faster on RAID 5 systems because parity calculations are ‘simple’ compared to the calculation on RAID 6.
5. Replacing a failed disk
Often drives in a RAID as supplied new will all be from within the same batch of drives. This means that the MTBFs are the same and there is a higher than normal risk that if a drive fails a second or third may also fail within a short time frame afterwards. It is therefore recommended that although drives should be matched in terms of performance and capacity, they should ideally come from different batches (if from the same manufacturer) or even different manufacturers to spread the risk of more than one failure occurring before the array can be rebuilt with a replacement drive.
6. Rebuild yes, rebuild no
The rebuild operation, as the name suggests, rebuilds the RAID array in the event of a drive problem. Many RAID systems offer the ‘hot plug’ that allows you to remove and replace the hard drive without the need to shutdown the system and consequently to terminate the service. Rebuilding a RAID array usually takes a long time, but this is preferred to recreating the data from scratch. However, if something fails in the rebuild procedure, the operation will result in additional damage. The execution of a rebuild is not without risks and it should be done only if you can rely on an updated and running backup.
7. Remember to number the drives
The drives of a RAID system have a specific position in the array. Recording the physical position in a RAID chain/numbering each drive makes life much easier in the event you need to reconstruct the RAID array or if you need to send the drives to a data recovery provider. In today’s systems, each individual drive has a reserved storage area with hidden RAID information, so to know the drive’s physical location may not be necessary. In these cases, it does not make any difference into which slot to put each drive. Even pulling drives out randomly and putting them back in a different order should not harm the setup only if the RAID information is still good (but unfortunately in many cases this area is damaged or overwritten).
8. Do not use a repair utility in case of data corruption on a RAID system
Controller issues or power failure may corrupt the file system or make data inaccessible. Do not launch utilities to repair the volumes because these utilities perform write operations and could add further damage the logical structure of the data and make data recovery more difficult.
9. Beware of “default options”
Most of the users are using the system configured with standard settings. However, some specialists can change these options to make the RAID system more secure than it was originally, i.e. by implementing a RAID 6 configuration. In case of errors, users are generally inclined to try different options including the restoration of default options. This may start a RAID 5 setup overwriting the RAID 6 with consequent data loss.
10. Turn off the system immediately!
In the case of data lost due to accidental deletion, immediately power off of your system. This ‘not conventional’ method is necessary because the classic shutdown function could overwrite disk areas where there is data in need of recovery. If a hard drive makes unusual noises the procedure is the same, in this case the goal is to reduce the extent of physical damage (typically a head crash where the head produces grooves on disk by removing the magnetic layer where data are stored).
11. Backup… please
If you’re planning to make changes to the RAID system, stop if you are not sure you have a current and running backup. Working or modifying a RAID configuration may accidentally lead to data loss, that can be solved by restoring the backup.
12. How to restore a backup
Resist the temptation to restore the backup directly to the RAID system where the data has been lost. If the backup is not updated or functioning or the procedure is not successful it will result in an overwriting operation over the only source (the original RAID system) that can be used to recover data.
13. Technical support from the hardware manufacturer
In the event of a RAID system failure it is normal to call technical support provided by the hardware manufacturer, especially on advanced and complex systems. Remember that the goal of the technical assistance is to have the RAID up and running, not to save the data on the system. The technicians involved will replace the failed drive(s) and configure the system to work again. And your data? Talk to the manufacturer’s technical support explaining that you may need to consult a data recovery provider. Remember the conflict of interest here – the manufacturer wants to get your system working perfectly, often at the expense of your data.
14. Data is lost, what to do now?
The complexity of a RAID recovery is due to three factors:
- RAID is a system made of multiple disks where data are written/organised in a way that is completely different from a single hard drive
- Data on RAID systems, in most cases, have complex logical structures, for example think about virtual machines or databases
- Enterprise-class RAID configurations use complex and advanced hardware architecture and often proprietary hardware
Retrieving data is possible but the choice of the supplier is really critical for the success of the data recovery. Select a data recovery company well-known in the market and that can offer you the expertise required to manage these complex RAID recoveries. Many cases require the development of specific tools or to adapt existing tools to the current scenario, your data recovery provider should be able to develop in its research centre. Finally, your data recovery company must offer 24/7 data recovery services since usually RAID systems are business-critical and you need to be up and running as soon as possible to reduce the cost of downtime.