Recovery Solutions for Large Storage Systems (Part 1)

September 27, 2016 by Milagros Gamero

Storage systems have become their own unique and complex area of IT and the term “storage systems” can mean different things to different people. So how do we define these systems? In this article, Storage systems are the hardware that store data.

For example, this may be a small business server supporting an office of ten users or less - the storage system would be the hard drives that are inside of that server where user information is located. In large business environments, the storage systems can be a large SAN cabinet that is full of hard drives with the storage space sliced-and-diced in different ways to provide redundancy and performance.

The ever-changing storage system technology

Today's storage technology encompasses all sorts of storage media. These could include Write Once Read Many (WORM) systems, tape library systems and virtual tape library systems. Over the past few years, SAN and NAS systems have provided excellent reliability. What is the difference between the two?

  • SAN (Storage Area Network)units can be massive cabinets - some with 240 hard drives in them! These large 50+ Terabyte storage systems are doing more than just powering up hundreds of drives. These systems are incredibly powerful data warehouses that have versatile software utilities behind them to manage multiple arrays, various storage architecture configurations and provide constant system monitoring
  • NAS (Network Attached Storage)units are self-contained units that have their own operating system, file system, and manage their attached hard drives. These units come in all sorts of different sizes to fit most needs and operate as file servers

In the past, large-scale storage were out reach for small businesses. Serial ATA (SATA) hard disk drive-based SAN systems have become a cost-effective way of providing large amounts of storage space. These array units also offer virtual tape backup systems - literally RAID arrays that are presented as tape machines; thereby removing the tape media element completely.

Other storage technologies such as iSCSI, DAS (Direct Attached Storage)Near-Line Storage (data that is attached to removable media), and CAS (Content Attached Storage) are all methods for providing data availability. Storage architects know that just having a 'backup' is not enough.

Speedy obsolescence

In today's high information environments, a normal nightly incremental or weekly full backup is obsolete in hours or even minutes after creation.

In large data warehouse environments, backing up data that constantly changes is not even an option. The only method for those massive systems is to have storage system mirrors - literally identical servers with the exact same storage space.

3 things to consider when choosing a system

Careful analysis of the operating environment is required. Most would say that having no failures at all is the best environment - that is true for users and administrators alike! The harsh truth is that data disasters happen every day despite the implementation of risk mitigation policies and plans.

When reviewing your storage needs, consider:

  • What is the recovery turn-time? How long can you or your client survive without the data? This will help to establish performance requirements for equipment
  • Quality of data restored.Is original restored data required or will an older backup copy of the data suffice? This relates to the backup scheme that is used. If the data on your storage system changes rapidly, then you will most-likely need the original data.
  • How much data are you or your client archiving?Restoring large amounts of data will take time to move through a network. On DAS (Direct Attached Storage) configurations, time of restoration will depend on equipment and I/O performance of the hardware

Unique data protection schemes

Storage system manufacturers are pursuing unique ways of processing large amounts of data while still being able to provide redundancy in case of disaster.

Some large SAN units incorporate intricate device block-level organization, essentially creating a low-level file system from the RAID perspective. Other SAN units have an internal block-level transaction log in place so that the control processor of the SAN is tracking all of the block-level writes to the individual disks. Using this transaction log, the SAN unit can recover from unexpected power failures or shutdowns.

How to improve recoverability

Some IT professionals specializing in storage systems propose adding more intelligence to the RAID array controller card so that it is 'file system aware.' This technology will provide more recoverability in the event that disaster strikes, the goal being the storage array would become more self-healing.

Other ideas along these lines are to have a heterogeneous storage pool where multiple computers can access information without being dependent on a specific system's file system. In organizations where there are multiple hardware and system platforms, a transparent file system will provide access to data regardless of what system wrote the data.

Other IT professionals are approaching the redundancy of the storage array quite differently. The RAID concept is in use on a vast number of systems, yet engineers are looking for new ways to provide better data protection in case of failure. The goals that drive this type of RAID development are data protection and redundancy without sacrificing performance.

With this information on the technologies used within storage systems architectures we end the first part of this article. In our next article, we cover the topics on how to prevent system failures as well as what to do when struck by data loss. See you then…