Data Loss in the Virtual Environment is Very Real (2)

Monday, 23 June 2014 by Marcel Mascunan

In the last post we explained why the virtual environment is not immune to data loss and gave some real life examples of the problems that can occur. This time, we are going to pass on some advice from Tim Black, Ontrack's Senior Lab Engineer specialising in VMware, about how to respond to some of the more common virtual environment scenarios.

Problem: Deleted or Missing Virtual Machine/VMDK

The first requirement is to record any information you know about the missing or deleted Virtual Disk. Relevant data includes: Virtual Disk size, thick or thin provisioning, VM name, Guest File System and the type of data contained.

Next, reduce reads and writes to the affected Datastore. If the Datastore contains active, thin provisioned Virtual Machines, power them down as soon as possible. Whatever you do, don't migrate any active VM (Storage vMotion or otherwise) to or from the affected Datastore without first speaking to a Data Recovery professional. You may unwittingly increase the complexity of the recovery and could reduce the chances of retrieving the data. Also, if you have a VM with missing or deleted snapshots, don't power it on. And if the machine is currently running, power it down as soon as possible.

Problem: Corrupt VMFS Metadata or Inaccessible Datastore

To minimise problems, never attempt to recreate the datastore and if you are investigating the LUN yourself, ensure that Read Only access is used.

Problem: RAID/Storage issues, including Mechanical Failure

Never replace a failed drive with a drive that was part of a previous RAID system; always zero out the replacement drive before using it. If the drive is making unusual mechanical noises, turn it off immediately and get assistance. Just as in the physical server environment, leaving a mechanically failing drive powered on increases the likelihood of further damage and significantly reduces the chances of a full recovery.

It's a good idea to label the drives with their position in a RAID array before removing them from your system. Also, if a RAID system fails in the middle of a rebuild process, do not run further rebuild attempts. Never migrate VMs to or from a suspect RAID, and if you do need to shut down or power cycle your RAID hardware, ensure all VMs and VMware hosts are gracefully shut down first.

Problem: Corruption inside Guest OS

Don't try to run volume repair utilities (such as CHKDSK) or defragmenter utilities on suspected corrupt Virtual Disks as this can exacerbate problems.

If you find that more than one VM shows signs of corruption, you may have a problem at the storage level. Power down the machines and consult a Data Recovery Professional as soon as possible.

In all scenarios, a consultation with Ontrack as early as possible can provide you with the best information on how to proceed to maximise the chance of successful VMware data recovery and reduce the potential for permanent data loss.