Protect your data in a RAID environment in 6 easy steps

13 December 2016 by Michael Nuncic

Since the introduction of the concept of Redundant Array of Independent (or Inexpensive) Disks was introduced, it has redefined how storage systems manage and store data. The way this technology works is actually quite simple: a RAID array is a configuration of several physical hard disks that together create a RAID architecture (e.g. RAID 0, 1, 5 etc.). This RAID architecture then distributes data over all the disks as it is regarded as a single disk by the operating system.

Even though the different RAID-levels have built-in mechanisms to protect them against data loss caused by a physical failure in one or more hard disks, the technology is not yet bulletproof. It is advisable to prepare yourself against a potential data loss when using a RAID array by considering the following tips:  

1. A backup is a must-have, not a nice-to-have

Whether you experience a system failure or a data loss which came about due to user error, always keep a current backup of your data at hand.

Many vendors of small, medium or high class RAID-based storage systems sell them as a type of “physical” backup. Don’t fall for this! A RAID – regardless of the level implemented – is not a backup. If you have a serious failure with your RAID, the data may be lost. A backup, if stored correctly and maintained current, is the best tool to get your data back on a rebuilt system.

2. Choose the right RAID configuration

The different RAID configurations have dissimilar redundancies to prevent data loss so it is wise to create a conceptual design of your storage system (or even buy one) which truly covers your data and data loss provisions.

Single drives will fail at some point during their lifetime. When this happens, assuming it is a RAID 1 or greater, the faulty drive can be replaced with a new one and you can rebuild the data storage map with no data loss.

When you have lots of big data it makes no sense to use a RAID 0, where no drive can fail or else you risk losing everything. Experience shows that at least one drive will fail eventually and your system would crash.

If a hard drive failure exceeds the redundancy capacity of your RAID array, don’t risk your critical data and get specialist help to rebuild your RAID array and recover your data. Otherwise you might risk making your loss permanent.

3. Don’t try to rebuild a system with several failed disks as your RAID level won’t support this

If you experience a failure on two drives on a RAID5 based system, it makes no sense to replace one of the failed drives and run a rebuild. You will probably make the data loss permanent.

The rebuild operation, as the term suggests, refers to the rebuilding of the RAID array in case of a drive failure. Many disk drives offer the "hot plug" that allows you to remove and replace the hard drive without the need to shut down the entire system and therefore terminate the service.

The rebuild operation usually takes a long time but this could be solved by rebuilding the RAID after replacing the failed disk with a new one. However, if something fails in the rebuild procedure it may result in additional damages. Remember: the execution of a rebuild is not without risks and it should be done only if you can rely on an updated and running backup.

4. Always be prepared for a system or hardware failure

As I have pointed out before, one or more hard disks or the RAID controller can fail at any time. If you have purchased a RAID-based system by one manufacturer it is very likely that all the other hard disks built into the system come from the same batch and have the same production date. This means it is not unlikely nor uncommon that they start to fail and reach their end-of-life at almost the same time. This is why it is absolutely necessary to check and monitor the status of the hard drives periodically.

If one hard drive in a RAID 5 fails, don’t hesitate to replace it immediately. Waiting for too long at this stage can cause severe data loss, as there’s an increased chance that another drive will also fail. Keeping track of the usage of the hard disks inside the RAID storage array is a must-do and should be implemented as part of a business continuity and data recovery plan, which every company should have. Additionally these plans should also cover what to do when a RAID storage array fails and data loss occurs.

In severe cases the built-in recovery functions won’t work and may even cause further damage, with the critical data being permanently destroyed. It is especially for such cases that I recommend keeping at hand the contact details of a trustworthy and professional data recovery service provider.

5. If you want to DIY a rebuild of a failed hard drive, label the hard drives and image the content the whole storage array before you start

With an image of all the drives available, you extend the chances of a data recovery at a later date, should the rebuild fail. Imagine for example a scenario where a rebuild stops at 5%, then a data loss occurs and the original content is overwritten so the data is permanently destroyed – you won’t have the least chance of recovering it again!

Once you have the images, professional data recovery experts can then rebuild the data by reconstructing the original RAID system and data structure.

Remember: the image disks should be labelled in the same order as in the storage array to facilitate the reconstruction.

6. If you’re unsure why the array failed and data loss occurred, don’t DIY a rebuild nor a data recovery

It is best to seek professional help. There’s no shame in not knowing how to get the system running again, what procedures have to be taken or how to access the lost data. Better let professionals help you than risk permanently destroying valuable data!

Making RAID more complex

Even if you follow these tips, it doesn’t mean you are safe from any future data loss. The more complex the storage systems become – and I’m not only thinking about RAID-based storages being stored, duplicated or divided over multiple physical hard disks – the more complicated it is to recover data after a loss occurs.

Nowadays RAID arrays are combined with several complex technologies in the systems so you can get virtualisation, RAID-levels, deduplication and much more. These combined technologies can fail and cannot be solved by an operating software function. It may well be that the data can only be recovered after a deep analysis by data recovery experts, who will decide which layer of complexity has to be recovered first, so the data can be accessed.

When you are not sure whether you have the necessary knowledge to recover the lost data from your RAID (and other technology-based storage system), it is best to contact a data recovery specialist like Ontrack. When it comes to sensitive data, you don’t want to risk making a complex situation even worse!