The evolution of storage part 2: Object storage and data recovery
In our previous blog post - The evolution of storage systems: Object, File or Block?- we explained how there is a need for a new storage concept for the ever-growing amount of big data. We showed how the traditional storage methods, File Storage and Block Storage can't handle the growing amount of data and that the solution to this challenge is Object storage. But how does this new storage method work when it comes to data protection and recovery?
Erasure coding and Object storage systems
For a long time RAID was a good choice to ensure safety from data loss from a hardware failure, broken hard disk or similar. Recovering data in this way worked fine, especially with smaller disk sizes which had been common over the last few years. But, with disk sizes approaching 10 Terabytes or more, a recovery using RAID could take months and is therefore not realistic.
When it comes to Object storage systems, RAID is therefore not suitable as a data security or data recovery method. It would be impossible to recover any data in a manageable time frame. An additional problem is that if another disk fails during a RAID rebuild, process data will be lost for good without any chance of recovery.
This is why, when it comes to Object storage systems, Erasure Coding is used.
As pointed out before, objects can be stored anywhere on a working Object storage system. Access to objects is accessible through their Internet-compatible structure via any device with Internet access. To make the objects more secure against data loss Erasure Coding comes into play.
Method-wise Erasure Coding is similar to a classical RAID recovery. Here too, additional information is created out of the saved objects. In a system with Erasure Coding (EC) the Objects are split into parts. These data blocks are typically several megabytes in size and therefore much bigger than the blocks that are created normally in a RAID protected system.
Each data block is analysed and in addition to the original data block, several smaller fragments are created (sherds). To recover original data a minimum amount of these so-called sherds are needed. For example, a data block creates 16 fragments and 12 of them are needed to rebuild the original information.
To create these fragments, a special mathematical formula – the XOR-Scheduling Algorithm – is used. For more information, the original thesis is here.
Advantages of Erasure Coding
Splitting data and objects into fragments has some advantages; given that these fragments are saved on different disks or at different locations, a failure on a drive does not lead to data loss. And even better, since not everything of the original data has to be rebuilt, the whole recovery process is a lot faster than its RAID counterpart. Additionally, it is much easier to store fragments on another storage system compared to storing a full backup. The administrator only has to make sure that enough fragments are available when they are needed.
Recovering data from Object storages
Due to the fact that these systems are highly secure against failures, none of these new Object storage systems have found their way into any of the data recovery labs of Ontrack Data Recovery.
However, what Ontrack Data recovery engineers were able to successfully recover was data from a beta version of an EMC Isilon storage system specially designed to hold vast amounts of big data. Simply put, an Isilon system is based on the concept of storing all data in a so-called "data lake" that is spread across a lot of hard disks drives. The idea of storing data inside a big "data lake" on an EMC is comparable to the concept of Object storage. Additionally, the EMC system works with a special file system which is specially designed for big data and the concept of a "data lake".
In this data loss case, with a beta version of the system, a kernel panic occurred and several hard disks showed failures. Even though EMC was able to recover most of the data themselves with the normal build-in recovery tools, a consistency check showed that several disks were faulty and therefore not all of the data could be rescued this way.
What was done?
To recover the data Ontrack engineers from the Research and Development department developed special tools to analyse an existing OneFS drive quickly and to find the missing or corrupt data structures. After the engineers were familiar with the original structure of the system, they were able to recover almost all of the 4 million files that had gone missing.
This example shows that modern data recovery is capable of recovering lost data from an Object storage system in the highly unlikely case that the built-in data recovery tools and safety features fail.
However, as the case shows it takes huge effort to dig deep into the structure of the system and locate the original data. And, due to the address management of the data being done in the application itself, it means that in such a case like the one above, a unique solution has to be developed to help the engineers find out where consistent data blocks are stored or where missing data blocks have to be recovered from.
Picture copyright: olafpictures/ pixabay.com
CC0 license https://pixabay.com/service/terms/#usage)