Data recovery stories from multi-layered enterprise storage systems

05 January 2017 by Michael Nuncic

Modern enterprise storage systems are now more than ever highly complex creations within an IT architecture and data structure. Whilst a few years ago these storage systems relied mostly on a single basic technology – RAID (Redundant Array of Independent (or Inexpensive) Disk) – nowadays this has changed.

While modern enterprise storage systems still rely on this technology, most high-end systems have now implemented additional technologies. For example virtualisation, deduplication and combined data pools. With the combination of virtualisation and the distribution of files and data over several hard disks inside the storage system, there is at least one additional virtualisation layer of data that should be considered. When a unique technology is used to combine a virtual machine within a software defined structure, you have yet another layer. Depending on how the manufacturer designed the product, the data can be stored on many different layers, which will all need to be reconstructed in case of a data loss.

Even though modern storage systems use different and complex technologies making the process of data recovery more challenging and difficult, it is still possible to get the data back - though it always depends on the particular situation.

A virtual nightmare

We’ve worked on many cases where we achieved a successful data recovery of a high end storage system with multiple data layers. In one case, an almost brand new VMware vSAN system completely broke down due to a single SSD memory used inside the system for cache memory.

A little background first

Since March 2014 VMware offers the vSAN option for vSphere EXSi servers to organise and manage storages. Unfortunately, the system failed only a few months later. This type of vSAN system combines applications or data saved in virtual machines in a single, joint, clustered Shared Storage Datastore. All the connected host computers and their hard drives are part of this joint data-store. In the event of a hardware-fault or data-loss, the data-rescue engineers need to deal with an additional information level.

This particular system consisted of 15 hard drives and three SSD memories. With the breakdown of a single SSD, three host computer/nodes failed and a temporary loss of four large virtual machines occurred. Unsurprisingly, a cold sweat ran amongst the responsible team as critical business data was lost.

To recover and save all the missing data from the failed storage system, we had to develop new software tools that could find the description and log-files necessary for the identification and assembly of data. The data-stores were functioning as containers, so we first needed to identify the links to the contained virtual machines and then reconstruct them during the next step.

With the new tools we were able to understand how virtual machines were saved in the vSAN’s data-store and distributed to the affected hard drives. We were then able to quickly find the necessary description and log-files, making the recovery process significantly easier (we did get all the data back, BTW).

Soaked datacentre

Another case we worked on earlier this year also highlights how data recovery from a highly complex and multi-layered structure for a high-end storage system can be difficult (though not impossible).

We received an HP Storage Works EVA (Enterprise Virtual Array) 6000 containing business critical SQL database files as well as employee information. Due to a natural disaster, the storage system was flooded and could not be accessed anymore.

Since an HP EVA system is fully virtualised, it works with disk groups and virtual disks instead of normal RAID sets and logical drive volumes. In an HP EVA all physical disks and their content are therefore randomly organised into disk groups. The logical drives in EVA’s virtual world – called vDisks – are distributed over all the installed HDDs. In this case 80 hard disks contained 18 virtual RAID volumes configured with both VRAID 1 and VRAID 5 arrays.

Once we cleaned and repaired the 25 hard disks from the storage system, the logical data recovery job could then begin. To be able to regain access to the data on the damaged hard disks, we not only needed to find out how the EVA data file system was structured as a whole but also to reverse engineer the whole system from scratch. Fortunately our team loves a challenge and tackled this project with gusto.

Data inception

To find out exactly how the EVA system was designed, we had to acquire identical hardware. We set them up in various configurations to find out how custom EVA file system structures are built as these define how the data is distributed within the drives.

Once we figured out the file system structure and how the vDisks were mapped over all of the disks, we rebuilt the whole EVA system with the original physical disks. Since HP EVA data recovery occurs at the disk level, we virtually assembled the disk groups and sub groups and then rebuilt the vDisks.  To recover the data saved on the vDisks, our Research and Development team created new tools to allow us to extract the data.

After almost six weeks of extensive development, reengineering and recovery work, the project was successfully completed. By using the newly created tools, we were able to recover four terabytes of sensitive SQL database files (almost 90 per cent of the data lost due to the flood).

Return of the living dead

These two examples show that it is definitely possible to recover data from storage that relies on different technologies that have several layered data structures. They have to be accessed consecutively though, in order to recover the actual data. How many of these layers have to be rebuilt and accessed will depend on the storage product and its underlying architecture.

One thing is for certain, if a manufacturer uses its own unique file system or mixes several technologies inside its own system, the harder and more difficult it will be to find out what has gone wrong after a data loss has occurred.

Each system and data recovery project has its own peculiarities which need to be overcome. That’s why it is important to choose an experienced data recovery service provider that has the necessary tools and knowhow to help you with your specific high-end storage system setup.