Proactive Suggestions for your Disaster Plan
Disaster recovery planning is a challenging process. During the planning phases, people naturally concentrate on tangible disasters such as fire, break-ins, and natural disasters. Data disasters should also be considered part of your disaster recovery plan. Here are some proactive suggestions for your disaster plan:
A review of emergency procedures on a quarterly basis is a proactive approach to disaster recovery. Key personnel should be up to date on all technical articles relating to primary business systems or messaging systems. Detailed documentation should be available in the server room area, describing individual machine configurations and software settings. Administrative documentation should be complete with each machine.
Microsoft Exchange Server Redundancy
For instance, in a business running Microsoft Exchange Message Server, is there a secondary restore server in place to handle the restoration of the server’s Information Store during an outage? All current versions of Exchange Server use Log Files to record message transactions before they are committed to the Information Store database. While ‘Circular Logging’ may assist in saving storage space, during a data disaster a complete set of log files are critical in bringing a restored Information Store up to date and getting your users back to their data.
Archived Data on Tape Media
Disaster recovery planning should have plans for off-site storage of backup tapes and other media. Tape backups bring additional validation testing steps to the plan. It is good practice to test the backups periodically. Tape rotation should be regular and consistent and monitoring the life spans of tapes is an important process to reduce media failures.
When there are disasters involving RAID storage systems, SAN systems, JBOD systems, and NAS systems, disaster planning takes a different perspective. These storage systems have redundancy architecture to prevent outages and disasters. However, this can provide a false sense of security.
For instance, one client last year had 40TB of storage space spread over 20 servers. These systems had hardware RAID 1+0 configurations. Problems began happening on one server when a drive would go off-line for a moment. The controller card would switch to the mirror copy as part of the redundancy process. At some point, the first drive would come back online. The controller card would switch back to the original drive and there would be inconsistent data from a volume and file system perspective. After a system power-down and restart, the storage system hardware reset. The operating system’s automatic volume repair program started and began making repairs. This became the cause of additional problems to the file system integrity and the critical data was no longer available. The data had to be available immediately and Remote Data Recovery was the option for this client.
This case history is interesting because of the cascade of failures that happened in quick succession. This client was processing large amounts of data from three shifts per day. To archive that amount of changing data every night was not possible. The client had been confident that the storage configuration was ‘bullet-proof’ due to the mirroring.
These configurations can be successful against multiple drive failures. In this case, however, the drive never failed, it just went off-line. When the drive came back online, there were file system inconsistencies. As a result, the data became unavailable when the automatic volume repair tool started making repairs. Engineers worked throughout the night to get the data available. In the end, the recovery was a 100% success.