IT administrators and managers continually strive to streamline their costs in the data center. Year after year, more servers in the data center are being run as virtual servers. The result is fewer physical servers, less energy consumption, and lower hardware and maintenance costs.
To manage these massive amounts of virtual machines and virtual servers, SDS (software-defined storage) uses and establishes a software layer to manage and combine all the necessary data center components like compute, server, storage, networking and security. Solutions like this are software-based and can run – theoretically – on various and different hardware products. In reality, it’s wise to use a software-defined solution on hardware which is already tested and approved by the software vendor.
On the other hand, hyper-converged infrastructure systems are the latest development in the effort to make IT more efficient. This type of solution leverages several technologies which are seamlessly integrated and managed as a single system. Hyper-converged systems merge all previously used concepts like converged solutions, traditional storage, and software-based solutions in single solution. This new solution is server based and technologically linked together. Like SDS, a hyper-converged solution is also based on virtualization and software-based management, but coupled with the hardware.
Simply put, hyper-converged systems are infrastructure systems with software-centric architecture that integrates compute (CPU), storage, networking, and virtualization resources and other technologies in a hardware box supplied by one vendor.
From a technical standpoint, users of a hyper-converged solution gain significant benefits managing all of their infrastructure resources and virtual machines under one single point of administration. Additionally, a single shared resource pool is used since all data center resources are bought into the resource stack. By delivering virtualization, storage, compute, network, management, and data protection in an easy to manage, yet scalable application, a company can seamlessly manage a complex infrastructure.
For a company using a hyper-converged solution, it means that far less hardware is needed. Such systems rely on common and inexpensive x86 commodity hardware – and unlike integrated systems – they can be upgraded, or in the case of failure, swapped with much smaller hardware units. To sum it up, hyper-converged solutions don’t need as much storage or bandwidth as integrated solutions, and can cut costs on hardware and energy.
But where is the data in a Software-defined or hyper-converged storage really stored?
The answer to this question is not easy and it relies heavily on the product used. Basically, SDS or hyper-converged solutions consist of several different data structure layers. To keep it simple, the structure can be compared to the Russian Matroska dolls. The user data is located in the deepest layer, while other technologies add their data layer on top of it. As one can see in the graphic below, the highest data layer is the one created by the SDS controller, which includes information about the virtual storage arrays. The next layer is the virtualization layer, created by the hypervisor. Underneath this layer are the server layers, which are then followed by the physical media layer.
In contrast to an SDS solution, the hypervisor layer created in a hyper-converged solution is the highest and latest layer of the data structure. Underneath this layer information of the SDS controller software is added. Additionally, a layer from one of the attached nodes was created. Lastly, the raw user data is inside this “container.” If this isn’t difficult enough, it gets worse.
Another characteristic of a SDS or a hyper-converged solution is that several OEM’s use proprietary file systems. For example, NetApp uses their own WAFL (Write Anywhere File Layout) system, which was created especially for their ONTAP operating system and optimized for use in networking environments. And there’s more: NetApp offers two other operating systems, each with their own advantages. VMware’s VSAN uses its own filesystem called on-disk Filesystem (VSAN FS) since Version 6 of their SDS solution. Dell EMC is offering VMware VSAN as a hypervisor-convergent storage technology for its PowerEdge Server products. The big data storage solution from Dell EMC – Isilon – has another file system called Isilon OneFS. In this file system, the metadata is spread throughout the many nodes attached to the system in a homogeneous fashion.
To keep it short and simple: Almost every SDS or hyper-converged solution uses their own file system and/or operating systems, which, in the case of a data loss, has to be demystified first before attempting a recovery.
Is data recovery in SDS or hyper-converged solutions possible?
Here at Kroll Ontrack, there have been many successful data recoveries of high end storage systems with multiple data layers. One of those cases included a brand new VMware vSAN system, which totally broke down because of just one SSD, which was used for caching. The vSAN system combines applications or data saved in virtual machines in a joint, clustered shared storage datastore. All connected host computers and their hard drives are part of this combined datastore. This means that in the event of a hardware-fault or data-loss, the recovery specialists needed to deal with an additional level of data. This particular system consisted of 15 HDDs and 3 SSDs, but with the breakdown of this one SSD, three host computer/nodes failed and a temporary loss of four large virtual machines occurred.
To recover and save all the missing data from the failed storage system, Kroll Ontrack engineers had to develop new software tools to find the description and log-files necessary for the identification and assembly of data. The datastores were functioning as containers, so the specialists first needed to identify the links to the contained virtual machines and then reconstruct them in the next step. Thanks to the new tools, they were able to get information about how the virtual machines were saved in the vSAN’s data-store and distributed to the affected hard drives. This allowed the data recovery experts to find the necessary description and log files much faster, making the recovery process significantly easier. With these tools at hand, the specialists were able to recover the virtual machines and all data stored on the vSAN system.
This example shows that recovering a SDS or hyper-converged solution, which includes a variety of technologies and data layers, is possible. The number of layers that have to be restored depends on the specific product and technologies used. Every data loss and data recovery job has its own challenges and it’s essential for a data recovery specialist to have the necessary tools and knowhow to handle such a challenging project.
Picture copyright: M.Großmann/pixelio.de