Make it big! Recovery solutions for large storage systems (Part2)

Thursday, September 29, 2016 by Milagros Gamero

In the first part of this article we covered the storage systems architectures, now we want do give some insights on how to cope with system failures and data loss:

Avoiding storage system failures

Though you may not be able to prevent a disaster from happening, you may be able to minimize your downtime.

There are many ways to reduce or eliminate the impact of storage system failures. For example, you can add redundancy to primary storage systems. These options include duplicate storage systems or identical servers, known as “mirror sites.” Adding an elaborate backup processes or file-system “snapshots” that always have a checkpoint to restore to, will provide another level of data protection. Some of these options can be quite costly making them affordable only to large business organizations.

Experience has shown there are usually multiple or rolling failures that happen when an organization has a data disaster. Therefore, to rely on just one restoration protocol is short-sighted. An organization with a successful data storage setup will have multiple layers of restoration pathways.

We have heard thousands of IT horror stories of initial  storage failures turning into complete data calamities. In an effort to bring back a system, some choices can permanently corrupt the data.

4 ways to minimize loss after a disaster

There are several risk mitigation policies that storage administrators can adopt that will help minimize data loss when a disaster happens:

  • Offline storage system:Avoid forcing an array or drive back online. There is usually a valid reason for a controller card to disable a drive or array, forcing an array back online may expose the volume to file system corruption.
  • Rebuilding a failed drive:When  rebuilding a single failed drive, it is important to allow the controller card to finish the process. If a second drive fails or go offline during this process, stop and get  professional data recovery services During a rebuild, replacing a second failed drive will change the data on the other drives.
  • Storage system architecture:Plan the storage system's configuration carefully. We have seen many cases with multiple configurations used on a single storage array. For example, three  RAID 5 arrays (each holding six drives) are striped in a RAID 0 configuration and then spanned. It is better to keep a simple storage configuration and document each aspect of it.
  • During an outage: If the problem escalates up to the Original Equipment Manufacturer (OEM) technical support, always ask “Is the data integrity at risk?” or, “Will this damage my data in any way?” If the technician says that there may be a risk to the data, stop and get professional data recovery services involved.