Common Scenarios of Server Data Disasters
When a data loss occurs on something as valuable as a server, it is essential to the life of your business to get back up and running as soon as possible. Below is a sampling of specific types of disasters accompanied with actual engineering notes from recent Remote Data Recovery jobs:
Causes of Partition / Volume / File System Corruption Disasters
- Corrupted File System due to system crash
- File system damaged to automatic volume repair utilities
- File system corruption due partition/volume resizing utilities
- Corrupt volume management settings
Causes of Specific File Error Disasters
- Corrupted business system database; file system is fine
- Corrupted message database; file system is fine
- Corrupted user files
- Windows 2000 server, volume repair tool damaged file system; target directories unavailable. Complete access to original files critical. Remote Data Recovery safely repaired volume; restored original data, 100% recovery. Evaluation Time: 20 Minutes
- Exchange 2000 server, severely corrupted Information store; corruption cause unknown. Scanned Information Store file for valid user mailboxes, results took up to 48 hours due to the corruption. Backup was one month old/not valid for users. Evaluation Time: 96 Hours (1.5 days)
Possible Causes of Hardware Related Disasters
- Server hardware upgrades (Storage Controller Firmware, BIOS, RAID Firmware)
- Expanding Storage Array capacity by adding larger drives to controller
- Failed Array Controller
- Failed drive on Storage Array
- Multiple failed drives on Storage Array
- Storage Array failure but drives are working
- Failed boot drive
- Migration to new Storage Array system
- Netware volume server, traditional NWFS, failing hard drive made volume inaccessible; Netware would not mount volume. Errors on hard drive were not in the data area and drive was still functional. Copied all of the data to another volume; 100% recovery.
Evaluation Time: 1 hour
Causes of Software Related Disasters
- Business System Software Upgrades (Service Packs, Patches to Business system)
- Anti-virus software deleted/truncated suspect file in error and data has been deleted, overwritten or both.
- Partial drive copy overwrite using third party tools, overwrite started and then crashed 1% into the process, found a large portion of the original data. Rebuilt file system, provided reports on recoverable data; customer will be requiring that we test some files to verify quality of recovery. Evaluation Time: 1 hour
Causes of User Error Disasters
- During a data loss disaster, restored backup data to exact location, thereby overwriting it
- Deleted files
- Overwritten operating system with reinstall of OS or application software
- User's machine had the OS re-installed – Restore CD was used; user looking for Outlook PST file. Searched for PST data through the drive because original file system completely overwritten. Found three potential files that might contain the user's data, after using PST recovery tools we found one of those files to contain all of the user's email; there were missing messages, majority of the messages/attachments came back. Evaluation Time: 5 hours
Causes of Operating System Related Disasters
- Server OS upgrades (Service Packs, Patches to OS)
- Migration to different OS
- Netware traditional, 2TB volume, damage to file system when trying to expand size of volume, repaired on drive, volume mountable. Evaluation Time: 4 hours
Server Recovery Tips
Data disasters will happen, accepting that reality is the first step in preparing a comprehensive disaster plan. Time is always against an IT team when a disaster strikes, therefore the details of a disaster plan are critical for success.
Here are some suggestions from Ontrack Data Recovery engineers of what not to do:
- In a disaster recovery, never restore data to the server that has lost the data—always restore to a separate server or location.
- In Microsoft Exchange or SQL failures, never try to repair the original Information Store or database files—work on a copy.
- In a deleted data situation, turn off the machine immediately. Do not shut down Windows—this will prevent the risk of overwritten data.
- Use a volume defragmenter regularly.
- If a drive fails on RAID systems, never replace the failed drive with a drive that was part of a previous RAID system—always zero out the replacement drive before using.
- If a drive is making unusual mechanical noises, turn it off immediately and get assistance.
- Have a valid backup before making hardware or software changes.
- Label the drives with their position in a RAID array.
- Do not run volume repair utilities on suspected bad drives.
- Do not run defragmenter utilities on suspected bad drives.
- In a power loss situation with a RAID array, if the file system looks suspicious, or is un-mountable, or the data is inaccessible after power is restored, do not run volume repair utilities.